Tuesday, September 30, 2008

Python Multi-processing

In this era of multiple cores everywhere, it kind of makes me nervous to see one of my computer cores siting idle as the other crunches away at some numerical simulation. Running heavy numerical simulations is the bread and butter of my work, so I am always on the lookout for ways to extract as much  computational juice as I can from my CPUs.

Over last couple of years I have played different approaches available to python programmers, from tools available in the standard library such as  forking processes and the threading module, to external packages such as Parallel Python and Ipython1. All of them have their pros and cons, and in many occasions, I found myself wasting valuable computable time trying to get my simulations to run under the parallelization models inherent to each of the solutions listed above.

I will not go into details about what I lked and disliked about each of them but rather I will focus  on the future of parallel processing in the Pythonsphere: the already available and soon to be part of the standard library, Processing module (renamed multiprocessing for the standard library).

It can be installed with a simple "easy_install processing".  For those who don't know yet, the processing/multiprocessing module is a multi-processing (duh!) module using the same API as the standard library's threading module.

The processing module  takes a lot of the pain out of the process of writing parallel code when compared to other methods. By using multiple processes, it saves you from having to deal with problems associated with having a shared memory between tasks. This means you can elegantly bypass the GIL, with the same code you would write for multithreaded application minus the boilerplate code you'd have to write to handle racing conditions and whatnot. This is the meaning of  sharing the same API with the threading module. Moreover, with processing, your code runs on Windows just as well as on Linux, which is something you couldn't do with fork.

Before processing, the (IMHO) best tool for "simple" multi-processing was Parallel-Python, but I found extremely painful having to manually declare global variables and modules which  each process would have to have access to. 

I must say that so far, my experience with processing is quite limited. However, I benefit from the point of view of having implemented the same exact (simple) code on all of the said platforms except for Ipython1, and I can attest  that for simple parallelizable problems, processing makes the task it is as simple as it can get.  

In conclusion,  if you can benefit from parallel processing in your application, I strongly suggest trying out the processing module.

13 comments:

Jesse said...

Thanks for the writeup! As an FYI, the API has changed a bit with the inclusion of the package in 2.6, and a handful of bugs have been addressed. Hopefully, I'll get a chance at pycon to do some sessions on the new package!

usagi said...

No problem Jesse!

One thing I should point out that I forgot, is the fact that Both Parallel-Python and Ipython1 support multiprocessing in clusters of computers which is something Processing don't yet support. I wonder if that is an appropriate feature for a standard library module, however.

Brandon Corfman said...

I've been using the multiprocessing lib in 2.6RC2, and IMO I don't think it's all that.

I'm working on a checkers game with a Tkinter GUI, and I wanted an easy way to do calculations in the background and keep the GUI active. I also thought you could terminate a process easier than a thread.

Here's some sample problems I found: 1) While the multiprocessing module follows threading closely, it's definitely not an exact match. One example: since parameters to a process must be "pickleable", I had to go through a lot of code changes to avoid passing Tkinter objects since these aren't pickleable. This doesn't occur with the threading module. 2) process.terminate() doesn't really work after the first attempt. The second or third attempt simply hangs the interpreter, probably because data structures are corrupted (mentioned in the API, but this is little consolation).

Those are just a few off the top of my head. In my opinion, threading is easier to use and understand ... and I'd never thought I'd say that! But it's true in my case.

usagi said...

I have had many of the same problems you mention while trying to work with parallel-python. The Pickleability requirement is really a pain in the "neck", However passing data between processes, is not an easy to solve problem. I try to minimize it by designing my solution around the need to share data among processes.

In your example, if all you need is to run computations in the background, I don't see why pass tkinter objects around...

Brandon Corfman said...

@usagi -- I'm using an MVC architecture for my Checkers program, so my Model notifies the View of updates. When I pass a Model reference to the process, the process tries to pickle the View reference too and blows up with a run time exception.

So it just means the multiprocessing lib introduces a dependency on the pickle module, and it's not just a drop-in replacement for threading. I had to carefully decouple my model & view in order to use the processing library in my application, and it's a pretty big "gotcha" for what I'm doing.

Jesse said...

@usagi - there is a clustering example in the 2.6 docs, and I'll be expanding on it using some work I am doing

@brandon - I'm not a GUI person, nor am I largely interested in GUI programming, therefore I can't offer any suggestions: but I am willing to take them to improve the multiprocessing package. The package is not a panacea, and won't help 100% of users - for *many* use cases: it is a drop in replacement (which is why I ran with it). Of course threading doesn't require pickling: It's all within the same memory space, which is not true for the mp package.

In your case, the threading module is more appropriate for the simple fact you don't have a problem for which the mp module meshes well. I'm open to suggestions and improvements though.

One day though - I hope people realize that the wide spread sharing of data between threads (and processes) that doesn't happen on a messaging or queue level is why they have so many problems.

Again, I'm open to improvements - but no, the mp package isn't going to fix 100% of problems. Some problems just aren't a fit.

Jesse said...

@usagi - Additionally, the mp package offers network-based managers for object sharing and message passing

Brandon Corfman said...

@jesse - My suggestion is to improve the documentation, putting the gotchas up front in the summary rather than embedded within the text. Right now it required a several reads top to bottom (several times in my case, along with code changes) to understand if I could use the lib effectively for my app. That is not typically how I read (or want to read) docs.

I also think you should have some recommendation in the docs on how to use process.terminate() correctly so that it doesn't hang ... e.g. recreate your pipes/queues if that actually works. I think terminate() should be removed if it isn't reliable.

Thanks for being open to improvements.

usagi said...

Thanks for the hint about support for clustering, Jesse.

illume said...

hi,

here's another simple option that may work for you...

For some stuff that releases the GIL, like pygame, and numpy using threads can be better... since you don't have to pickle massive amounts of memory.

Using the experimental threads module in pygame makes it just like using the python map function.

import pygame.threads
pygame.threads.init(2)
tmap = pygame.threads.tmap

result = tmap(func, [data,data])

Jesse said...

@brandon - for the initial inclusion, I didn't want to grossly refactor the docs - but I'll file an enhancement for myself for 2.6.1/2.7 and 3.1 to do this.

Alex said...

somethign of note is that the processing module works flawlessly with the stackless project.

One of my little side software dabbling projects is to see how hard/easy it is to utilize a multi-core system in python, as it comes up on the python list all the time it seems. Originally I had done manual forking and all that rot, which was painful. Once I found out that processing module (still running under 2.5) works with stackless, my code became a LOT simpler.

My testing code was a psudeo mud, where mobs would interact and trade with each other. very simplistic and unpolished. :)

I ended up with a master process that handled message routing from subprocesses. this used a python thread per subprocess to handle incoming messages on the queue object, and a dedicated thread that sent out ticks on a broadcast channel. The worker processes then handled a world 'town', and all the tasklets and objects associated with it. as mobs moved from town to town, they would be pickled, sent via the master process to their new worker process, unpickled and their channels reinitialized.

I was VERY surprised at the ease of getting this up and running using the processing module. Once you find a good demarkcation point for your data exchange in your program, it becomes very easy to scale up to X cores.

much thanks goes out to all the teams behind python, stackless, and the processing module.

NIC1138 said...

That is great! Usually when I want to start lots of processes to analyse some data I use something like xargs, or write bash scripts... I always thought of either multiple threads or forking when thinking of "in-program" solutions. But this allows us to easily create producer/consumer processes! That is great. How does this IPC work anyway?...

ccp

Amazon