Tuesday, September 30, 2008

Python Multi-processing

In this era of multiple cores everywhere, it kind of makes me nervous to see one of my computer cores siting idle as the other crunches away at some numerical simulation. Running heavy numerical simulations is the bread and butter of my work, so I am always on the lookout for ways to extract as much  computational juice as I can from my CPUs.

Over last couple of years I have played different approaches available to python programmers, from tools available in the standard library such as  forking processes and the threading module, to external packages such as Parallel Python and Ipython1. All of them have their pros and cons, and in many occasions, I found myself wasting valuable computable time trying to get my simulations to run under the parallelization models inherent to each of the solutions listed above.

I will not go into details about what I lked and disliked about each of them but rather I will focus  on the future of parallel processing in the Pythonsphere: the already available and soon to be part of the standard library, Processing module (renamed multiprocessing for the standard library).

It can be installed with a simple "easy_install processing".  For those who don't know yet, the processing/multiprocessing module is a multi-processing (duh!) module using the same API as the standard library's threading module.

The processing module  takes a lot of the pain out of the process of writing parallel code when compared to other methods. By using multiple processes, it saves you from having to deal with problems associated with having a shared memory between tasks. This means you can elegantly bypass the GIL, with the same code you would write for multithreaded application minus the boilerplate code you'd have to write to handle racing conditions and whatnot. This is the meaning of  sharing the same API with the threading module. Moreover, with processing, your code runs on Windows just as well as on Linux, which is something you couldn't do with fork.

Before processing, the (IMHO) best tool for "simple" multi-processing was Parallel-Python, but I found extremely painful having to manually declare global variables and modules which  each process would have to have access to. 

I must say that so far, my experience with processing is quite limited. However, I benefit from the point of view of having implemented the same exact (simple) code on all of the said platforms except for Ipython1, and I can attest  that for simple parallelizable problems, processing makes the task it is as simple as it can get.  

In conclusion,  if you can benefit from parallel processing in your application, I strongly suggest trying out the processing module.