So here is my simple code:
#Set implementations benchmark
import random,time
seta = set([random.randint(0,100000) for n in xrange(10000)])
setb =set([random.randint(0,100000) for n in xrange(10000)])
t0 = time.clock()
for i in xrange(1000):
seta & setb
seta | setb
seta ^ setb
print "Time: %s seconds"%(time.clock()-t0)
and here are the timings:
$ python set_bench.py
Time: 9.45 seconds
$ ipy set_bench.py
Time: 141.460593000 seconds
CPython is simply 15 times faster than Iron Python!
I always like to have external tool for comparison. So I converted my little Python script to C++ with ShedSkin, compiled and ran it:
$ ./set_bench
Time: 30.66 seconds
CPython was still more than 3 times faster than the C++ generated by ShedSkin (0.0.21)!!
For the reference: I used IronPython 1.0.2467 on .NET 2.0.50727.42 on an Ubuntu machine. It would be nice if someone could re-run this on a Windows box.
If anyone knows of a faster solution for determining the intersection between two sets in Python (perhaps using dictionaries?), I would be very interested to know.
11 comments:
On my laptop (2GHz Core 2 Duo) I see the following:
python set_bench.py:
Time: 3.41490350031 seconds
ipy set_bench.py:
Time: 30.5470964187 seconds
I should have followed up by saying I'm running 64-bit Vista, IronPython 1.1 on .NET 2.0.50727.312, and Python 2.5
You may want to use random.seed() to make the benchmark reproducible. Although with a set this big, the variation probably doesn't change the timings much.
Your timings also include the VM startup overhead. If you want to benchmark set operations specifically, I suggest using the timeit module.
Out of curiosity, why did you choose to publish the benchmark source code as an image? Doesn't blogger support <pre>?
Sorry, I didn't notice that you measure just the time of the for loop. For some reason I assumed you used the Unix 'time' command. Ignore the bits about VM overhead in my previous comment.
Unfortunately,Blogger does not offer any facilty for formating code. That's for short snippet I opt for posting screenshots of the code...
I see the same thing on my 32-bit MacBook w/ python 2.5. CPython runs 3 times faster than ShedSkin. Looking more closely, I found that ShedSkin reimplements the set operations using an STL hash_set. At least that part of your benchmark seems to be showing how well tuned the Python hashing implementation is.
Thanks for the corroboration Stanley. I notified Mark Dufour, the developer of Shed Skin, and I am waiting for his answer on this.
thanks for measuring set performance using shedskin! I'll optimize things a bit before the next release..
mark dufour (shedskin author)
hmm.
I was able to speed the test up by about 40% on my computer, but it's still 3 times slower here (changes are in CVS).
it really looks like __gnu_cxx::hash_set (the underlying STL type I use) is just not that efficient. I might have a look at boost::unordered_set, google's sparse hash project, or wait for c++0x, which comes with a builtin unordered_set.
FWIW, here is how CPython achieves high performance for sets:
1. Sets are implemented as hash tables modeled after the CPython's highly tuned code for dictionaries but with smaller entries that fit better into memory caches (sets only store a key/hash pair instead of a key/value/value triplet).
2. The intersection method takes advantage of commutativity and loops over the smaller of the two sets, "for k in set1: if k in set2: result[k] = True" after re-ordering the inputs so that len(set1) <= len(set2).
3. Since sets store key/hash pairs, all binary set operations (such as intersection) are run without any calls to the __hash__ method for individual set elements. This is much faster than the equivalent code using dictionaries where the "if k in set2" step and the "result[k] = True" step each make a fresh call to k.__hash__().
4. When looping over a set, the CPython code directly traverses the hash table instead of using an iterator (which would be a bit slower).
thanks for the explanation!
Shedskin's set implementation is now based on CPython; together with some of the optimizations you mention, and improvements in inlining, Shed Skin is now significantly faster than CPython for this benchmark.
Post a Comment