Friday, September 21, 2007

The Wonders of Pyglet

I have been playing with Pyglet, and I am very happy with it. it's still in alpha, but is developing fast and the alpha is already very stable.

The most important feature of Pyglet is that it's being designed from the ground up to be OS independent (Linux, Win, OSX) without external dependencies. For that, it uses the standard OpenGL implementation of each of these platforms via ctypes. This makes it my last best hope for a multi-platform graphical interface kit.

It already has a growing widget library, support for layout of html documents, import 3d models created with wings3d, a scene 2d module with support to sprites and collision detection, and a lot more. Most of these functionalities are available only in the SVN version. The (stable) release is somewhat more limited.

Go check it out! it is (IMHO) one of the few truly exciting graphical libraries in the python scene.

Monday, September 17, 2007

Parallel Processing in CPython

I am sick of hearing naive discussions about the GIL, and how it precludes Python programs from take advantage of multiple cpu/cores. Thats is absolutely a non-issue, given the abundant ways in which we can write parallel programs in Python today (MPI4Py, Ipython1, parallel-python, etc.). In this post I want to talk about parallel-python (PP), and pit it against threading solutions.

Before I go on, the usual disclaimer: I know that PP does multi-processing, not multi-threading, which is what the GIL won't let you do. But PP offers a very simple and intuitive API, that can be used for both multi-core CPUs and clusters. If, after seeing what PP can do for you, you still believe you need threads, use Jython!!

All examples here were run on a Xeon quad core, with 4GB of RAM, Running Ubuntu Feisty. Python interpreters used were: CPython 2.5.1, Jython 2.1 on java 1.6.0 and IronPython 1.0.2467.

Let's start with Parallel Python: I am using an example taken straight from PP's web site. Here is the code:


#!/usr/bin/python
# File: dynamic_ncpus.py
# Author: Vitalii Vanovschi
# Desc: This program demonstrates parallel computations with pp module
# and dynamic cpu allocation feature.
# Program calculates the partial sum 1-1/2+1/3-1/4+1/5-1/6+... (in the limit it is ln(2))
# Parallel Python Software: http://www.parallelpython.com

import math, sys, md5, time
import pp

def part_sum(start, end):
"""Calculates partial sum"""
sum = 0
for x in xrange(start, end):
if x % 2 == 0:
sum -= 1.0 / x
else:
sum += 1.0 / x
return sum

print """Using Parallel Python"""
print


start = 1
end = 20000000

# Divide the task into 64 subtasks
parts = 64
step = (end - start) / parts + 1

# Create jobserver
job_server = pp.Server()

# Execute the same task with different amount of active workers and measure the time
for ncpus in (1, 2, 4, 8, 16, 1):
job_server.set_ncpus(ncpus)
jobs = []
start_time = time.time()
print "Starting ", job_server.get_ncpus(), " workers"
for index in xrange(parts):
starti = start+index*step
endi = min(start+(index+1)*step, end)
# Submit a job which will calculate partial sum
# part_sum - the function
# (starti, endi) - tuple with arguments for part_sum
# () - tuple with functions on which function part_sum depends
# () - tuple with module names which must be imported before part_sum execution
jobs.append(job_server.submit(part_sum, (starti, endi)))

# Retrieve all the results and calculate their sum
part_sum1 = sum([job() for job in jobs])
# Print the partial sum
print "Partial sum is", part_sum1, "| diff =", math.log(2) - part_sum1

print "Time elapsed: ", time.time() - start_time, "s"
print
job_server.print_stats()


and here are the results:
Using Parallel Python

Starting 1 workers Partial sum is 0.69314720556 | diff = -2.50000421476e-08 Time elapsed: 7.85552501678 s

Starting 2 workers
Partial sum is 0.69314720556 | diff = -2.50000421476e-08 Time elapsed: 4.37666606903 s

Starting 4 workers
Partial sum is 0.69314720556 | diff = -2.50000421476e-08 Time elapsed: 2.11173796654 s Starting 8 workers Partial sum is 0.69314720556 | diff = -2.50000421476e-08 Time elapsed: 2.06818294525 s

Starting 16 workers
Partial sum is 0.69314720556 | diff = -2.50000421476e-08 Time elapsed: 2.06896090508 s

Starting 1 workers
Partial sum is 0.69314720556 | diff = -2.50000421476e-08 Time elapsed: 8.11736106873 s Job execution statistics: job count | % of all jobs | job time sum | time per job | job server 384 | 100.00 | 67.1039 | 0.174750 | local Time elapsed since server creation 27.0066168308

In order to compare it to threading code, I had to adapt the example to use threads. Before I fed the new code to Jython, I ran it though CPython to illustrate the fact that, under the GIL, threads are not executed in parallel but one at a time. This first run would also serve as a baseline to compare Jython results against.

The code is below. Since Jython 2.1 does not have the sum function, I implemented it with reduce (there was not perceptible perfomance difference when compared with the built-in sum).

#jython threads
import math, sys, time
import threading

global psums

def part_sum(start, end):
"""Calculates partial sum"""
sum = 0
for x in xrange(start, end):
if x % 2 == 0:
sum -= 1.0 / x
else:
sum += 1.0 / x
psums.append(sum)

def sum(seq):
# no sum in Jython 2.1, we will use reduce
return reduce(lambda x,y:x+y,seq)

print """Using: jython with threading module"""
print

start = 1
end = 20000000

# Divide the task into 64 subtasks
parts = 64
step = (end - start) / parts + 1
for ncpus in (1, 2, 4, 8, 16,1):
# Divide the task into n subtasks
psums = []
parts = ncpus
step = (end - start) / parts + 1
jobs = []
start_time = time.time()
print "Starting ",ncpus, " workers"
for index in xrange(parts):
starti = start+index*step
endi = min(start+(index+1)*step, end)
# Submit a job which will calculate partial sum
# part_sum - the function
# (starti, endi) - tuple with arguments for part_sum
t=threading.Thread(target=part_sum,name="", args=(starti, endi))
t.start()
jobs.append(t)
# wait for threads to finish
[job.join() for job in jobs]
# Retrieve all the results and calculate their sum
part_sum1 = sum(psums)
# Print the partial sum
print "Partial sum is", part_sum1, "| diff =", math.log(2) - part_sum1

print "Time elapsed: ", time.time() - start_time, "s"
print


and here are the results for CPython:
Using: CPython with threading module

Starting 1 workers
Partial sum is 0.69314720556 | diff = -2.50001152002e-08
Time elapsed: 8.17702198029 s

Starting 2 workers
Partial sum is 0.69314720556 | diff = -2.50001570556e-08
Time elapsed: 10.2990288734 s

Starting 4 workers
Partial sum is 0.69314720556 | diff = -2.50001127577e-08
Time elapsed: 11.1099839211 s

Starting 8 workers
Partial sum is 0.69314720556 | diff = -2.50001097601e-08
Time elapsed: 11.6850161552 s

Starting 16 workers
Partial sum is 0.69314720556 | diff = -2.50000701252e-08
Time elapsed: 11.8062999249 s

Starting 1 workers
Partial sum is 0.69314720556 | diff = -2.50001152002e-08
Time elapsed: 11.0002980232 s

Here are the results for Jython:

Using: jython with threading module

Starting 1 workers
Partial sum is 0.6931472055600734 | diff = -2.500012807882257E-8
Time elapsed: 4.14300012588501 s

Starting 2 workers
Partial sum is 0.6931472055601045 | diff = -2.500015916506726E-8
Time elapsed: 2.0239999294281006 s

Starting 4 workers
Partial sum is 0.6931472055600582 | diff = -2.5000112868767133E-8
Time elapsed: 2.1430001258850098 s

Starting 8 workers
Partial sum is 0.6931472055600544 | diff = -2.500010909400885E-8
Time elapsed: 1.6349999904632568 s

Starting 16 workers
Partial sum is 0.6931472055600159 | diff = -2.5000070569269894E-8
Time elapsed: 1.2360000610351562 s

Starting 1 workers
Partial sum is 0.6931472055600734 | diff = -2.500012807882257E-8
Time elapsed: 2.4539999961853027 s

And lastly, the results for IronPython:

Using: IronPython with threading module

Starting 1 workers
Partial sum is 0.6931472055601 | diff = -2.50001280788e-008
Time elapsed: 13.6127243042 s

Starting 2 workers
Partial sum is 0.6931472055601 | diff = -2.50001591651e-008
Time elapsed: 7.60165405273 s

Starting 4 workers
Partial sum is 0.6931472055601 | diff = -2.50001128688e-008
Time elapsed: 8.14302062988 s

Starting 8 workers
Partial sum is 0.6931472055601 | diff = -2.5000109205e-008
Time elapsed: 8.32349395752 s

Starting 16 workers
Partial sum is 0.6931472055600 | diff = -2.50000707913e-008
Time elapsed: 8.37589263916 s

Starting 1 workers
Partial sum is 0.6931472055601 | diff = -2.50001280788e-008
Time elapsed: 10.3567276001 s

Now on to some final considerations. The quality of a parallelization tool should be measured not in how fast it is, but how well it scales. The attentive reader may have noticed that Jython threads, were twice as fast than PP. But is that performance related to the threads? No, since it was already faster than CPython (with threading or with PP) for a single thread. PP scaled better up to the available number of Cores, consistently halving the time when doubling the number of cores used. Jython, halved the time when it went from one to two threads, but failed to halve the time again, when going to 4 threads. I'll give it a break here since it recovered at 8 and 16 threads.

Threads alone are not the answer, if they are not well implemented. Look at the results from IronPython, It seem not to be able to take advantage of more than two threads, on a four core system. Can anyone explain this? I'd be curious to know why.

Wednesday, September 12, 2007

ZODB vs Durus

Soon after, I posted my last article about ZODB performance against SQLite3, I got a polite comment from Michael Watkins reminding me of Durus. Durus is a simpler object database inpired by ZODB. Despite not having many of the features of ZODB, such as multi-threaded storage access, multiple storage backends, asynchronous IO, versions, undo and conflict resolution (according to Durus own FAQ), It is a very capable database. So I decided to adapt my benchmark script and pitch Durus against ZODB. Please note that my benchmark code is very simple and does not explore well the differences between Durus and ZODB. A better comparison is left as an exercise to the reader. ;-)

Despite the simplicity of my test code, There was one suprinsing result of my test. Both databases used files as storages, but the file size for Durus was 3.7MB for a million records, while ZODB file size was 23.7MB !!!

Both database systems offer the option of packing their stores, to reduce size, but this feature was not used. Besides, to pack a ZODB storage file, the same ammount of free disk space is required, wich only makes matters worse for ZODB. Please, also check Michael's Blog for a very interesting benchmark of Durus vs cPickle.

Here is the code:

import time, os, glob
import ZODB
from ZODB import FileStorage, DB
import pylab as P

from durus.file_storage import FileStorage as FS
from durus.connection import Connection


def zInserts(n):
print "Inserting %s records into ZODB"%n
for i in xrange(n):
dbroot[i] = {'name':'John Doe','sex':1,'age':35}
connection.transaction_manager.commit()

def DurusInserts(n):
print "Inserting %s records into Durus"%n
for i in xrange(n):
Droot[i] = {'name':'John Doe','sex':1,'age':35}
conndurus.commit()

recsize = [1000,5000,10000,50000,100000,200000,400000,600000,800000,1000000]
zperf = []
durusperf =[]
for n in recsize:
# remove old databases
if os.path.exists('testdb.fs'):
[os.remove(i) for i in glob.glob('testdb.fs*')]
if os.path.exists('test.durus'):
os.remove('test.durus')
# setup ZODB storage
dbpath = 'testdb.fs'
storage = FileStorage.FileStorage(dbpath)
db = DB(storage)
connection = db.open()
dbroot = connection.root()
#Setting up durus database
conndurus = Connection(FS("test.durus"))
Droot = conndurus.get_root()
#begin tests
t0 = time.clock()
zInserts(n)
t1 = time.clock()
# closing and reopening ZODB' database to make sure
# we are reading from file and not from some memory cache
connection.close()
db.close()
storage = FileStorage.FileStorage(dbpath)
db = DB(storage)
connection = db.open()
dbroot = connection.root()
t2 = time.clock()
print "Number of records read from ZODB: %s"%len(dbroot.items())
t3 = time.clock()
ztime = (t1-t0)+(t3-t2)
zperf.append(ztime)
print 'Time for ZODB: %s seconds\n'%ztime
t4 = time.clock()
DurusInserts(n)
t5 = time.clock()
conndurus = Connection(FS("test.durus"))
Droot = conndurus.get_root()
t6 = time.clock()
print "Number of records read from Durus: %s"%len(Droot.items())
t7 = time.clock()
Dtime = (t5-t4)+(t7-t6)
durusperf.append(Dtime)
print 'Time for Durus with db on Disk: %s seconds\n'%Dtime
P.plot(recsize,zperf,'-v',recsize,durusperf,'-^')
P.legend(['ZODB','Durus'])
P.xlabel('inserts')
P.ylabel('time(s)')
P.show()

Tuesday, September 11, 2007

ZODB vs Relational Database: a simple benchmark

Recently, I posted about relational databases performances. In that experiment, I found SQLite3, a database that comes in the Python standard distribution, to be the second fastest database backend available for multiple inserts.

Since this blog is about Python, I soon felt bad about not including ZODB in that comparison. At the time I justified that omission, by saying to myself that ZODB cannot be compared to standard DBs because it is an object database. Subconsciously, I thought ZOBD would loose so badly in a race against relational databases, that I feared for its reputation. Silly me.

The truth is: object databases such as ZODB, can be a perfect replacement for relational databases in a large portion (if not the majority) of database driven applications. Had I stopped to look more carefully at ZODB before, I would have saved countless hours of struggle with ORMS.

As you can see in the figure above, for up to a 100000 inserts per transaction, ZODB's performance is comparable to SQLite3 and since ZODB allows you to store arbitrarily complex objects, you don't have to cook up complex SQL queries to get at data you need, the relation between each datum is given by the design of the object you are storing. In some apps of mine, I have to write code to extract the the data from my Python objects, put them in table format (to store in a relational db), and then, when I read them back, I have to have more code to put them back where they belong. With ZODB, none of that is necessary.

ZODB stores your data in a file like SQLite, however it supports other storage types, see this table for a comparison of storage types.

ZODB is certainly one of the hidden jewels of Zope. Due to the lack of good documentation (an exception, though somewhat outdated), many Python programmers either don't known that ZODB can be used outside of Zope or don't know how to get started with it.

The goal of this post is not to serve as a tutorial of ZODB, since I am hardly an expert in the subject, but to spike the interest in adopting ZODB for mundane applications outside Zope.

Let get to the code:


import time, os, glob
import sqlite3
import ZODB
from ZODB import FileStorage, DB
import pylab as P

def zInserts(n):
print "Inserting %s records into ZODB"%n
for i in xrange(n):
dbroot[i] = {'name':'John Doe','sex':1,'age':35}
connection.transaction_manager.commit()

def zInserts2(n):
print "Inserting %s records into ZODB"%n
dbroot['employees'] = [{'name':'John Doe','sex':1,'age':35} for i in xrange(n)]
connection.transaction_manager.commit()


def testSqlite3Disk(n):
print "Inserting %s records into SQLite(Disk) with sqlite3 module"%n
conn = sqlite3.connect('dbsql')
c = conn.cursor()
# Create table
c.execute('''create table Person(name text, sex integer, age integer)''')
persons = [('john doe', 1, 35) for i in xrange(n)]
c.executemany("insert into Person(name, sex, age) values (?,?,?)", persons)
c.execute('select * from Person')
print "Number of records selected: %s"%len(c.fetchall())
c.execute('drop table Person')


recsize = [1000,5000,10000,50000,100000,200000,400000,600000,800000,1000000]
zperf = []
sqlperf =[]
for n in recsize:
# remove old databases
if os.path.exists('testdb.fs'):
[os.remove(i) for i in glob.glob('testdb.fs*')]
if os.path.exists('dbsql'):
os.remove('dbsql')
# setup ZODB storage
dbpath = 'testdb.fs'
storage = FileStorage.FileStorage(dbpath)
db = DB(storage)
connection = db.open()
dbroot = connection.root()
#begin tests
t0 = time.clock()
zInserts(n)
t1 = time.clock()
# closing and reopening ZODB' database to make sure
# we are reading from file and not from some memory cache
connection.close()
db.close()
storage = FileStorage.FileStorage(dbpath)
db = DB(storage)
connection = db.open()
dbroot = connection.root()
t2 = time.clock()
print "Number of records read from ZODB: %s"%len(dbroot.items())
t3 = time.clock()
ztime = (t1-t0)+(t3-t2)
zperf.append(ztime)
print 'Time for ZODB: %s seconds\n'%ztime
t4 = time.clock()
testSqlite3Disk(n)
t5 = time.clock()
stime = (t5-t4)
sqlperf.append(stime)
print 'Time for Sqlite3 with db on Disk: %s seconds\n'%stime
P.plot(recsize,zperf,'-v',recsize,sqlperf,'-^')
P.legend(['ZODB','SQLite3'])
P.xlabel('inserts')
P.ylabel('time(s)')
P.show()

As you can see in this very simple example, Using ZODB is no harder than using a dictionary, and it performs better than all ORMs I know! Below are the numeric results for the beginning of the plot above.

ZODB allows for a much more sophisticated usage than the one shown here. I chose to do it this way to make the insert operations on ZODB and SQLite as similar as possible. I hope the ZODB gurus out there will get together to write an up-to-date detailed tutorial on ZODB for Python programmers. ZODB deserves it. And so do we!

Monday, September 3, 2007

PyconBrasil[03]

Last week I had the pleasure to attend, for the first time, Brasil's largest meeting of Python users: PyconBrasil[03]. My impression of the community couldn't be better, Everyone was very nice and open, and talks were awesome. I will make specific posts about the talks that impressed me most, which is not to say that talks I don't mention were not great as well, but I really can't make any relevant comments on talks regarding business solutions, e-government, etc. If you are interested in those topics, I recommend watching the videos of the talks on google video (most of them are in portuguese).

The first thing that impressed me positively, was the number of science-related talks. They were very high level. My own talk was only mildly scientific, since I had planned the talk to preach about the importance of expanding the Python academic community. It turns out that the existing community is already highly sensitive to the scientific possibilities of Python. In the event, I met many full time scientists among the "Pythonistas". It was also nice to notice that a large number of members of the community were involved with science as well. A good example is Fabiano Weimar, one of the exponents of the Brazilian Python scene, who is working towards his doctoral degree, working with speech recognition using Hidden Markov Models, If I understood it correctly. It will be nice to see a good python implementation of HMM in Python, though I am not sure if that is in his plans. The funny thing is, that I believed that the last chapter of my book, about stochastic methods, would find almost no echo on the Python community, due to its dryer scientific language and focus. Apparently I was wrong, which is great!.

Even though PyconBrasil is on its third iteration, the Brasilian Python association, a non-profit, organized to promote Python in Brasil, was celebrating only three months of existence, I met their staff and found them very nice and open, I wish them all the success they deserve!

I want to close this post with big thanks to the Python community as a whole for receiving me and my book so well, and letting them know that I will keep doing everything in my reach to help the community grow and be known in the scientific community.

ccp

Amazon