I can no longer remember what was the last computer I owned, which had a single core/processor. Multiprocessing make me much happier as I know I can milk my hardware for all it has to offer. I numpy, scipy and related packages find ways to build parallelization into their libraries as soon as possible.
This goal of this article is to share the simple usage pattern I have adopted to parallelize my code which is both simple and works in both single and multi-core systems. As a disclaimer, much of my day-to-day code, involves repeated calculations of some sort, which are prone to be asynchronously distributed. This explains the pattern I am going to present here.
I start setting up a process pool, as close as possible to the code which will be distributed, preferably in the same local namespace. This way I don't keep processes floating around after my parallel computation is done:
# -*- coding: utf-8 -*-
from numpy import arange,sqrt, random, linalg
from multiprocessing import Pool
counter = 0
print counter, r
po = Pool()
for i in xrange(1,300):
j = random.normal(1,1,(100,100))
The call Pool() returns a pool of as many processes as there are cores/cpus available. This makes the code perform optimally on any number of cores, even on single core machines.
The use of a callback function allows for true asynchronicity, since my loop does not have to wait for apply_async to return.
finally, the po.close() and po.join() calls are essential to make sure that all 300 processes which have been fired up finish execution and are terminated. This also eliminates any footprints from your parallel execution, such as zombie processes left behind.
So this is my main pattern! what is yours? please share it and any comments you may have on mine.