Inspired by a series of slides Michael Schurter published on Tokyo Cabinet and PyTyrant, I thought I'd code up his examples using another database which can use a key-value approach, Durus.
Durus is a ZODB work-a-like which allows for easy persistence of Python objects, not just values. It's simple, fast, and useful.
Here's the baseline Tokyo Cabinet db example Michael published, using the pytc interface:
import pytc
db = pytc.HDB()
db.open('test.tch', pytc.BDBOWRITER | pytc.BDBOREADER | pytc.BDBOCREAT)
for i in range(256):
v = chr(i)
for x in range(256):
db.put(chr(x), v)
db.get(chr(x))
Running it:
$ time python test.py
real 0m0.168s
user 0m0.157s
sys 0m0.010s
And here is a Durus example, accessing a local file-based storage:
# Durus example 1 - File-based persistent dictionary
from durus.file_storage import FileStorage
from durus.connection import Connection
conn = Connection(FileStorage('test.durus'))
db = conn.get_root()
for i in range(256):
v = chr(i)
for x in range(256):
db[chr(x)] = v
db[chr(x)]
conn.commit()
Running it:
$ time python durus-test.py
real 0m0.197s
user 0m0.187s
sys 0m0.008s
Now lets change to client-server operation, delivering more or less the same abilities as PyTyrant/Tokyo cabinet. A minor change to durus-test.py gives us a client:
# Durus example 2 - Remote access to a File-based persistent dictionary
from durus.client_storage import ClientStorage
from durus.connection import Connection
conn = Connection(ClientStorage())
db = conn.get_root()
for i in range(256):
v = chr(i)
for x in range(256):
db[chr(x)] = v
db[chr(x)]
conn.commit()
In between each run we'll remove the database file. We'll need a server running, so in another terminal lets fire one up:
$ rm test.durus
$ durus -s --file test.durus
Run the second example:
$ time python durus-remote-test.py
real 0m0.204s
user 0m0.189s
sys 0m0.013s
Lets use a more advanced container than a persistent dictionary, a BTree. First Tokyo Cabinet/pytc:
import pytc
db = pytc.BDB()
db.open('test.db', pytc.BDBOWRITER | pytc.BDBOREADER | pytc.BDBOCREAT)
for i in range(256):
v = chr(i)
for x in range(256):
db.put(chr(x), v)
db.get(chr(x))
Running pytc with the BTree:
$ time python test.py
real 0m0.169s
user 0m0.157s
sys 0m0.011s
Nice and fast - its all C-based.
Now the Durus BTree code:
# Durus example 3 - File-based persistent BTree
from durus.file_storage import FileStorage
from durus.connection import Connection
from durus.btree import BTree
conn = Connection(FileStorage('test.durus'))
root = conn.get_root()
db = BTree()
root['db'] = db
for i in range(256):
v = chr(i)
for x in range(256):
db[chr(x)] = v
db[chr(x)]
conn.commit()
Running this we see a significant performance delta compared to the C-based pytc/Tokyo Cabinet:
$ time python durus-btree.py
real 0m1.319s
user 0m1.308s
sys 0m0.011s
The delta will tip back into Durus's favour in the next two examples.
# Durus example 4 - client-server access to a persistent BTree
from durus.client_storage import ClientStorage
from durus.connection import Connection
from durus.btree import BTree
conn = Connection(ClientStorage())
root = conn.get_root()
db = BTree()
root['db'] = db
for i in range(256):
v = chr(i)
for x in range(256):
db[chr(x)] = v
db[chr(x)]
conn.commit()
First, the access the BTree-based "db" via client-server:
$ time python durus-remote-btree-adding.py
real 0m1.691s
user 0m1.681s
sys 0m0.010s
Next we see that read only access, remote or local, remains fast, even with the BTree structure:
$ time python durus-remote-btree-ro.py
real 0m0.054s
user 0m0.040s
sys 0m0.012s
PyTyrant / TokyoCabinet has a nice simple API to accessing the remote server:
import pytyrant
t = pytyrant.PyTyrant.open('127.0.0.1', 1978)
for i in range(256):
v = chr(i)
for x in range(256):
t[chr(x)] = v
t[chr(x)]
PyTyrant client-server access to a BTree structure suggests future room for improvement:
$ time python pyt-test.py
real 0m11.151s
user 0m1.317s
sys 0m1.653s
Of course raw throughput isn't everything. Durus has persistent container types including Dictionary,
BTree, Set and Lists. Keys in mappings can be any hashable object; values can
be any pickleable object. Durus objects are Python objects, not merely strings or values.
Consider the following:
$ durus -c
Durus 127.0.0.1:2972
connection -> the Connection
root -> the root instance
>>> from durus.persistent_dict import PersistentDict
>>> names = PersistentDict()
>>> root['names'] = names
>>> connection.commit()
>>> mike = 'Mike Watkins'
>>> fred = 'Fred Astaire'
>>> ringo = 'Ringo Starr'
>>> names[1] = mike
>>> names[2] = fred
>>> names[3] = ringo
>>> names[22] = fred
>>> id(names[2])
3082202976
>>> id(names[22])
3082202976
>>> connection.commit()
When we reconnect, we should expect the values within the mapping at keys 2 and 22 to be the same object:
$ durus -c
Durus 127.0.0.1:2972
connection -> the Connection
root -> the root instance
>>> names = root['names']
>>> id(names[2])
3081790720
>>> id(names[22])
3081790720
>>> id(names[2]) == id(names[22])
True
Of late there seems to be plenty of interest in non-SQL database architectures -- CouchDB, Tokyo Cabinet among others getting attention, in part because they offer a language agnostic solution.
For those many other times when a project will benefit from a persistence layer tightly coupled with the language, object databases like Durus or ZODB are worthy of consideration.