Tuesday, December 28, 2010

Efficient MCMC in Python -- Errata and some extra info

In my previous post, some readers pointed out that the pure Python version of the code was slower than it should be. I checked and found out that the timing was wrong due to some bug in the %time flag in the sage notebook.

Some other interested readers pointed out that using numpy's RNGs in the pure Python version would sure improve the performance. Again I went back and tested it.

So without further ado, here is the new timings in the lame machine I am writing this in:
  • Pure Python: 107.5 seconds
  • Pure Python + Numpy: 106 seconds
  • Pure Python + Numpy + storing the results in an array: 103.7 seconds
  • Cython + standard library's random and math modules: 102 seconds
  • Cython +Numpy: 93.26 seconds
  • Cython + GSL RNGs: 5.3 seconds
The source code and instructions for compilation of the cython versions can be found in this gist. Please Have fun with it and continue to suggest further improvements.

4 comments:

  1. The pure python version can be made a little faster by removing the function lookup:

    gv = random.gammavariate
    g = random.gauss
    sqrt = math.sqrt

    The results are:
    time 105.6 seconds (original)
    time 102.4 seconds (less lookup)

    ReplyDelete
  2. ShedSkin 0.7, with minor modifications to pure Python code (x, y = 0.0, add the main() from Cython example) and creating a specialized extension ("shedskin -b -r -e gibbs.py && make") gives speed near to Cython+GSL. PyPy is 2x to 3x as slow as that.

    ReplyDelete
  3. A minor point, but in the version with the GSL RNGs, shouldn't the sample from the Gaussian distribution be adjusted for the desired mean? I.e.

    y = gaussian(r,1.0/Sqrt(x+1)) + 1.0/(x+1)

    ReplyDelete
  4. @quoter:
    You are right, I have fixed it in the gist, and in the sage notebook.

    Thanks for the bug report!

    ReplyDelete