Friday, April 17, 2009

Parallel version of the Gillespie solver

A few months back I blogged about my implementation of the Gillespie SSA algorithm for solving stochastic differential equations. I have now improved the code allowing for parallel execution on SMP systems. If you don't know what this is all about, please refer the the first post for a quick intro to what a stochastic differential equation is and what it means to numerically solve a system of them.

The code of the solver is in this module. There is an example in the package which is also worth running. On a two core machine the simulation runs about twice as fast, a expected. The parallelization was implemented using the multiprocessing module, and ammounts to roughly 4 lines of code. I love multiprocessing. and it's also compatible with psyco which makes the simulations run twice as fast (on top of the parallelization gains). Probably my next shot at improving its performance will be using scipy.weave and migrate the core of the algorithm to C. Simpler optimizations schemes such as cython (my quick port), runs twice as slow as my python code. Cython can be quite tricky to optimize beyond Python performance, and generates horribly verbose C code (if you don't believe me, try running cython on a print "hello world", you will get a 1000+ lines hello.c !!!).

It is worth noting that the code for the solver and examples are being maintained as a subpackage of BIP, my Bayesian inference package. Even though it is a bit unrelated, Since I frequently have to do Bayesian inference on mathematical models (both deterministic and stochastic), I thought it would be a good idea to keep the solver to those models in the same package as the analysis tools.

2 comments:

Francesco said...

Hi, I'm looking at your code, thanks for sharing it.

While gillespie.py runs on my machine (python 2.6 on windows vista, no psyco), I cannot run the example... It freezes the box (quad core) but I still hadn't the time to find precisely where.

By the way I've noticed that you are using some non strictly pythonic approaches in some places, is there a specific reason (did you found them to be faster)? I refer to things like passing the names of the variables and their initial values instead of using a dict, for instance...

I don't know if I'll be able to look further at it during the weekend but I would also like to give a look to the Cython version, it usually worked wonders for me :-)


bye,
Francesco
ps: maybe you could avoid to shadow the builtin names "vars" and "round", it can be a bit confusing...

Unknown said...

Thanks for the comments francesco, I'll definetly replace vars and count!

As for the non-pythonic bits: I agree that the use of dictionary is much more elegant, but would screw up the positional matching required for operating with transition matrix in the GSSA method (lines 97-99 in gillespie.py).

If you can think of a way of replacing the matricial approach for another which uses dictionaries and is not slower, I'll gladly accept the patch.

I have updated the code in trunk as well as in the Pypi package.

If you decide to improve the cython version (I hope you do), please note that is was done based on a previous implementation of gillespie.py. You might want to update it to match the current python implementation.

ccp

Amazon