Writing data to databases is something that I have to do quite often. In the last few years I have used many different databases and their python interfaces: Pysqlite, Pysqlite2, MySQldb, PsycoPg and PsycoPG2, SQlObject and SQLAlchemy. In many cases, when the data is to be used only by a Python program, I have used Pickle and cPickle.
Up until now, my criteria to pick a database and a given inteface, where mainly based on convenience or the need to use a particular database backend. However, recently, after having chosen SQLObject for a project based only on how neatly organized the resulting code looked like, I stumbled upon a major performance issue with SQLObject: it was extremely slow for a large number of inserts.
After searching all over and asking on SQLObject mailing list, I found out that trying to improve its insert speed was a lost cause. Apparently, something in the core design of SQLObject, impossibilitates it to do the equivalent of an executemany() when doing inserts. The executemany() method, which we find in all(?) other database interfaces, gives us a tremendous boost in insertion speed.
So I finally decided to do what I must have done years ago: write a simple benchmark script for all the interfaces I cared about.
I simply created a table, inserted 100000 short records into the database, and selected them back to check if the insertion operation had been completed. Then I dropped the table. I repeated this operation for all interfaces.
Since I have never before come across a straightforward comparison like the one I did, I decide to blog about it and save other people from learning it the hard way, like me.
So, without further ado, lets see the results:
Starting benchmarking with 100000 records (inserting and selecting)
Inserting 100000 records into SQLite(Memory) with SQLObject
Number of records selected: 100000
Time for SQLObject with db in memory: 68.07 seconds
Inserting 100000 records into SQLite(Memory) with sqlite3 module
Number of records selected: 100000
Time for Sqlite3 with db in memory: 2.87 seconds
Inserting 100000 records into PostgreSQL with PsycoPG2
Number of records selected: 100000
Time for PsycoPG2: 19.49 seconds
Inserting 100000 records into SQLite(Disk) with sqlite3 module
Number of records selected: 100000
Time for Sqlite3 with db on Disk: 3.02 seconds
Inserting 100000 records into SQLite(Disk) with SQLAlchemy
Number of records selected: 100000
Time for SQLAlchemy with SQlite db in Disk: 18.37 seconds
Inserting 100000 records into SQLite(memory) with SQLAlchemy
Number of records selected: 100000
Time for SQLAlchemy with SQlite db in Memory: 18.02 seconds
Inserting a 100000 records into a Pickle file with cPickle
Number of records loaded: 100000
Time for cPickle: 0.39 seconds
Inserting 100000 records into MySQL with MySQLdb
Number of records selected: 100000
Time for MySQL: 1.28 secondsI was surprised at a few things (ok, at just about everything!):
- How slow SQLObject really was (not really, I guess I already expected that...)
- SQLite was not substantially faster with the DB in memory when compared to an "on disk" DB.
- SQLite is pretty darn fast!
- MySQL is even Faster!!!
- PsycoPg2 is not as fast as people say...
- SQLAlchemy does an excellent job at doing everything it does and not sacrificing performance in the process!
Before Running the script, make sure you have MySQL and PostgreSQL running, and adjust user names and passwords in the code. You also need to create a database "People" in both these servers.
One Last observation: for sqlite, I am using the sqlite3 module from Python 2.5's standard library. I don't know if it is based on Pysqlite or not. I wonder if Pysqlite2 would give different performance results.
I would appreciate any comments, and suggestions to extract more performance from each interface. The source code for the benchmarking is
here.