Monday, August 6, 2007

Set implementation performance

I recently, blogged about my concerns for CPython being replaced by IronPython in the browser platform. My main concerns in the other post were mainly of political nature. But now, as I was investigating the performance of set operations in Python for a project, I decided to compare CPython and IronPython on their set implementations.

So here is my simple code:

#Set implementations benchmark
import random,time

seta = set([random.randint(0,100000) for n in xrange(10000)])
setb =set([random.randint(0,100000) for n in xrange(10000)])

t0 = time.clock()
for i in xrange(1000):
seta & setb
seta | setb
seta ^ setb
print "Time: %s seconds"%(time.clock()-t0)

and here are the timings:

$ python
Time: 9.45 seconds
$ ipy
Time: 141.460593000 seconds

CPython is simply 15 times faster than Iron Python!

I always like to have external tool for comparison. So I converted my little Python script to C++ with ShedSkin, compiled and ran it:

$ ./set_bench
Time: 30.66 seconds

CPython was still more than 3 times faster than the C++ generated by ShedSkin (0.0.21)!!

For the reference: I used IronPython 1.0.2467 on .NET 2.0.50727.42 on an Ubuntu machine. It would be nice if someone could re-run this on a Windows box.

If anyone knows of a faster solution for determining the intersection between two sets in Python (perhaps using dictionaries?), I would be very interested to know.


Jeremiah said...

On my laptop (2GHz Core 2 Duo) I see the following:

Time: 3.41490350031 seconds

Time: 30.5470964187 seconds

Jeremiah said...

I should have followed up by saying I'm running 64-bit Vista, IronPython 1.1 on .NET 2.0.50727.312, and Python 2.5

Marius said...

You may want to use random.seed() to make the benchmark reproducible. Although with a set this big, the variation probably doesn't change the timings much.

Your timings also include the VM startup overhead. If you want to benchmark set operations specifically, I suggest using the timeit module.

Out of curiosity, why did you choose to publish the benchmark source code as an image? Doesn't blogger support <pre>?

Marius said...

Sorry, I didn't notice that you measure just the time of the for loop. For some reason I assumed you used the Unix 'time' command. Ignore the bits about VM overhead in my previous comment.

usagi said...

Unfortunately,Blogger does not offer any facilty for formating code. That's for short snippet I opt for posting screenshots of the code...

Stanley said...

I see the same thing on my 32-bit MacBook w/ python 2.5. CPython runs 3 times faster than ShedSkin. Looking more closely, I found that ShedSkin reimplements the set operations using an STL hash_set. At least that part of your benchmark seems to be showing how well tuned the Python hashing implementation is.

usagi said...

Thanks for the corroboration Stanley. I notified Mark Dufour, the developer of Shed Skin, and I am waiting for his answer on this.

srepmub said...

thanks for measuring set performance using shedskin! I'll optimize things a bit before the next release..

mark dufour (shedskin author)

srepmub said...


I was able to speed the test up by about 40% on my computer, but it's still 3 times slower here (changes are in CVS).

it really looks like __gnu_cxx::hash_set (the underlying STL type I use) is just not that efficient. I might have a look at boost::unordered_set, google's sparse hash project, or wait for c++0x, which comes with a builtin unordered_set.

rhettinger said...

FWIW, here is how CPython achieves high performance for sets:

1. Sets are implemented as hash tables modeled after the CPython's highly tuned code for dictionaries but with smaller entries that fit better into memory caches (sets only store a key/hash pair instead of a key/value/value triplet).

2. The intersection method takes advantage of commutativity and loops over the smaller of the two sets, "for k in set1: if k in set2: result[k] = True" after re-ordering the inputs so that len(set1) <= len(set2).

3. Since sets store key/hash pairs, all binary set operations (such as intersection) are run without any calls to the __hash__ method for individual set elements. This is much faster than the equivalent code using dictionaries where the "if k in set2" step and the "result[k] = True" step each make a fresh call to k.__hash__().

4. When looping over a set, the CPython code directly traverses the hash table instead of using an iterator (which would be a bit slower).

srepmub said...

thanks for the explanation!

Shedskin's set implementation is now based on CPython; together with some of the optimizations you mention, and improvements in inlining, Shed Skin is now significantly faster than CPython for this benchmark.