Tuesday, December 28, 2010

Efficient MCMC in Python -- Errata and some extra info

In my previous post, some readers pointed out that the pure Python version of the code was slower than it should be. I checked and found out that the timing was wrong due to some bug in the %time flag in the sage notebook.

Some other interested readers pointed out that using numpy's RNGs in the pure Python version would sure improve the performance. Again I went back and tested it.

So without further ado, here is the new timings in the lame machine I am writing this in:
  • Pure Python: 107.5 seconds
  • Pure Python + Numpy: 106 seconds
  • Pure Python + Numpy + storing the results in an array: 103.7 seconds
  • Cython + standard library's random and math modules: 102 seconds
  • Cython +Numpy: 93.26 seconds
  • Cython + GSL RNGs: 5.3 seconds
The source code and instructions for compilation of the cython versions can be found in this gist. Please Have fun with it and continue to suggest further improvements.


Anonymous said...

The pure python version can be made a little faster by removing the function lookup:

gv = random.gammavariate
g = random.gauss
sqrt = math.sqrt

The results are:
time 105.6 seconds (original)
time 102.4 seconds (less lookup)

victorg said...

ShedSkin 0.7, with minor modifications to pure Python code (x, y = 0.0, add the main() from Cython example) and creating a specialized extension ("shedskin -b -r -e gibbs.py && make") gives speed near to Cython+GSL. PyPy is 2x to 3x as slow as that.

quoter said...

A minor point, but in the version with the GSL RNGs, shouldn't the sample from the Gaussian distribution be adjusted for the desired mean? I.e.

y = gaussian(r,1.0/Sqrt(x+1)) + 1.0/(x+1)

Flavio Coelho said...

You are right, I have fixed it in the gist, and in the sage notebook.

Thanks for the bug report!