Python in Science: Set implementation performance

Monday, August 6, 2007

Set implementation performance

I recently, blogged about my concerns for CPython being replaced by IronPython in the browser platform. My main concerns in the other post were mainly of political nature. But now, as I was investigating the performance of set operations in Python for a project, I decided to compare CPython and IronPython on their set implementations.

So here is my simple code:


#Set implementations benchmark
import random,time

seta = set([random.randint(0,100000) for n in xrange(10000)])
setb =set([random.randint(0,100000) for n in xrange(10000)])

t0 = time.clock()
for i in xrange(1000):
    seta & setb
    seta | setb
    seta ^ setb
print "Time: %s seconds"%(time.clock()-t0)

and here are the timings:

$ python set_bench.py
Time: 9.45 seconds
$ ipy set_bench.py
Time: 141.460593000 seconds

CPython is simply 15 times faster than Iron Python!

I always like to have external tool for comparison. So I converted my little Python script to C++ with ShedSkin, compiled and ran it:

$ ./set_bench
Time: 30.66 seconds

CPython was still more than 3 times faster than the C++ generated by ShedSkin (0.0.21)!!

For the reference: I used IronPython 1.0.2467 on .NET 2.0.50727.42 on an Ubuntu machine. It would be nice if someone could re-run this on a Windows box.

If anyone knows of a faster solution for determining the intersection between two sets in Python (perhaps using dictionaries?), I would be very interested to know.

11 comments:

Unknown said...: On my laptop (2GHz Core 2 Duo) I see the following:

python set_bench.py:
Time: 3.41490350031 seconds

ipy set_bench.py:
Time: 30.5470964187 seconds; August 6, 2007 at 1:12 PM
Unknown said...: I should have followed up by saying I'm running 64-bit Vista, IronPython 1.1 on .NET 2.0.50727.312, and Python 2.5; August 6, 2007 at 1:14 PM
Marius Gedminas said...: You may want to use random.seed() to make the benchmark reproducible. Although with a set this big, the variation probably doesn't change the timings much.

Your timings also include the VM startup overhead. If you want to benchmark set operations specifically, I suggest using the timeit module.

Out of curiosity, why did you choose to publish the benchmark source code as an image? Doesn't blogger support <pre>?; August 6, 2007 at 5:08 PM
Marius Gedminas said...: Sorry, I didn't notice that you measure just the time of the for loop. For some reason I assumed you used the Unix 'time' command. Ignore the bits about VM overhead in my previous comment.; August 6, 2007 at 5:10 PM
Unknown said...: Unfortunately,Blogger does not offer any facilty for formating code. That's for short snippet I opt for posting screenshots of the code...; August 6, 2007 at 7:37 PM
Unknown said...: I see the same thing on my 32-bit MacBook w/ python 2.5. CPython runs 3 times faster than ShedSkin. Looking more closely, I found that ShedSkin reimplements the set operations using an STL hash_set. At least that part of your benchmark seems to be showing how well tuned the Python hashing implementation is.; August 6, 2007 at 9:37 PM
Unknown said...: Thanks for the corroboration Stanley. I notified Mark Dufour, the developer of Shed Skin, and I am waiting for his answer on this.; August 7, 2007 at 10:25 AM
srepmub said...: thanks for measuring set performance using shedskin! I'll optimize things a bit before the next release..

mark dufour (shedskin author); August 12, 2007 at 5:53 PM
srepmub said...: hmm.

I was able to speed the test up by about 40% on my computer, but it's still 3 times slower here (changes are in CVS).

it really looks like __gnu_cxx::hash_set (the underlying STL type I use) is just not that efficient. I might have a look at boost::unordered_set, google's sparse hash project, or wait for c++0x, which comes with a builtin unordered_set.; August 19, 2007 at 5:34 AM
rhettinger said...: FWIW, here is how CPython achieves high performance for sets:

1. Sets are implemented as hash tables modeled after the CPython's highly tuned code for dictionaries but with smaller entries that fit better into memory caches (sets only store a key/hash pair instead of a key/value/value triplet).

2. The intersection method takes advantage of commutativity and loops over the smaller of the two sets, "for k in set1: if k in set2: result[k] = True" after re-ordering the inputs so that len(set1) <= len(set2).

3. Since sets store key/hash pairs, all binary set operations (such as intersection) are run without any calls to the __hash__ method for individual set elements. This is much faster than the equivalent code using dictionaries where the "if k in set2" step and the "result[k] = True" step each make a fresh call to k.__hash__().

4. When looping over a set, the CPython code directly traverses the hash table instead of using an iterator (which would be a bit slower).; September 24, 2007 at 10:22 PM
srepmub said...: thanks for the explanation!

Shedskin's set implementation is now based on CPython; together with some of the optimizations you mention, and improvements in inlining, Shed Skin is now significantly faster than CPython for this benchmark.; July 13, 2009 at 8:51 AM

Python in Science

Monday, August 6, 2007

Set implementation performance

11 comments:

ccp

Amazon

Technorati