mike watkins dot ca : Entries tagged with “Durus”

Entries tagged with “Durus”

August 25 2009

QP and Durus Updated

The folks at mems-exchange.org released a new version of the Python web application / site-management framework QP and supporting packages.

All of today's released packages support Python >= 2.4, which includes Python 3.1.

They also released an update to Durus, a compact and mature Python object database (which at its core operates like a minimal ZODB/ZEO work-a-like). The API to a Python object db is simply Python.

Key-value databases appear to be in vogue these days. Python developers with an interest in key-value databases may want to check out Durus (or ZODB): the key-value database you already know, and more.

More means more than key-value pairs and simple types. More means virtually any Python object / type. More means more than simple string or integer keys and simple values and offers not only persistent dictionaries but also persistent lists and sets, and persistent objects of most any design you may wish to implement.

Durus has no other package dependencies and is compact. Weighing in with less than 5000 lines of code it small enough to read in one sitting if you want to see how things tick. Or you can just dive in - start a server durus -s and a client durus -c and play.

March 12 2009

Tyrannical Databases

Inspired by a series of slides Michael Schurter published on Tokyo Cabinet and PyTyrant, I thought I'd code up his examples using another database which can use a key-value approach, Durus.

Durus is a ZODB work-a-like which allows for easy persistence of Python objects, not just values. It's simple, fast, and useful.

Here's the baseline Tokyo Cabinet db example Michael published, using the pytc interface:

import pytc
db = pytc.HDB()
db.open('test.tch',  pytc.BDBOWRITER | pytc.BDBOREADER | pytc.BDBOCREAT)
for i in range(256):
    v = chr(i)
    for x in range(256):
        db.put(chr(x), v)
        db.get(chr(x))

Running it:

$ time python test.py
real    0m0.168s
user    0m0.157s
sys     0m0.010s

And here is a Durus example, accessing a local file-based storage:

# Durus example 1 - File-based persistent dictionary
from durus.file_storage import FileStorage
from durus.connection import Connection

conn = Connection(FileStorage('test.durus'))
db = conn.get_root()
for i in range(256):
    v = chr(i)
    for x in range(256):
        db[chr(x)] = v
        db[chr(x)]
conn.commit()

Running it:

$ time python durus-test.py
real        0m0.197s
user        0m0.187s
sys         0m0.008s

Now lets change to client-server operation, delivering more or less the same abilities as PyTyrant/Tokyo cabinet. A minor change to durus-test.py gives us a client:

# Durus example 2 - Remote access to a File-based persistent dictionary
from durus.client_storage import ClientStorage
from durus.connection import Connection

conn = Connection(ClientStorage())
db = conn.get_root()
for i in range(256):
    v = chr(i)
    for x in range(256):
        db[chr(x)] = v
        db[chr(x)]
conn.commit()

In between each run we'll remove the database file. We'll need a server running, so in another terminal lets fire one up:

$ rm test.durus
$ durus -s --file test.durus

Run the second example:

$ time python durus-remote-test.py
real        0m0.204s
user        0m0.189s
sys         0m0.013s

Lets use a more advanced container than a persistent dictionary, a BTree. First Tokyo Cabinet/pytc:

import pytc
db = pytc.BDB()
db.open('test.db',  pytc.BDBOWRITER | pytc.BDBOREADER | pytc.BDBOCREAT)
for i in range(256):
    v = chr(i)
    for x in range(256):
        db.put(chr(x), v)
        db.get(chr(x))

Running pytc with the BTree:

$ time python test.py

real    0m0.169s
user    0m0.157s
sys     0m0.011s

Nice and fast - its all C-based.

Now the Durus BTree code:

# Durus example 3 - File-based persistent BTree
from durus.file_storage import FileStorage
from durus.connection import Connection
from durus.btree import BTree

conn = Connection(FileStorage('test.durus'))
root = conn.get_root()
db = BTree()
root['db'] = db
for i in range(256):
    v = chr(i)
    for x in range(256):
        db[chr(x)] = v
        db[chr(x)]
conn.commit()

Running this we see a significant performance delta compared to the C-based pytc/Tokyo Cabinet:

$ time python durus-btree.py
real        0m1.319s
user        0m1.308s
sys         0m0.011s

The delta will tip back into Durus's favour in the next two examples.

# Durus example 4 - client-server access to a persistent BTree
from durus.client_storage import ClientStorage
from durus.connection import Connection
from durus.btree import BTree

conn = Connection(ClientStorage())
root = conn.get_root()
db = BTree()
root['db'] = db
for i in range(256):
    v = chr(i)
    for x in range(256):
        db[chr(x)] = v
        db[chr(x)]
conn.commit()

First, the access the BTree-based "db" via client-server:

$ time python durus-remote-btree-adding.py
real        0m1.691s
user        0m1.681s
sys         0m0.010s

Next we see that read only access, remote or local, remains fast, even with the BTree structure:

$ time python durus-remote-btree-ro.py
real        0m0.054s
user        0m0.040s
sys         0m0.012s

PyTyrant / TokyoCabinet has a nice simple API to accessing the remote server:

import pytyrant

t = pytyrant.PyTyrant.open('127.0.0.1', 1978)
for i in range(256):
    v = chr(i)
    for x in range(256):
        t[chr(x)] = v
        t[chr(x)]

PyTyrant client-server access to a BTree structure suggests future room for improvement:

$ time python pyt-test.py

real    0m11.151s
user    0m1.317s
sys     0m1.653s

Of course raw throughput isn't everything. Durus has persistent container types including Dictionary, BTree, Set and Lists. Keys in mappings can be any hashable object; values can be any pickleable object. Durus objects are Python objects, not merely strings or values.

Consider the following:

$ durus -c
Durus 127.0.0.1:2972
    connection -> the Connection
    root       -> the root instance
>>> from durus.persistent_dict import PersistentDict
>>> names = PersistentDict()
>>> root['names'] = names
>>> connection.commit()
>>> mike = 'Mike Watkins'
>>> fred = 'Fred Astaire'
>>> ringo = 'Ringo Starr'
>>> names[1] = mike
>>> names[2] = fred
>>> names[3] = ringo
>>> names[22] = fred
>>> id(names[2])
3082202976
>>> id(names[22])
3082202976
>>> connection.commit()

When we reconnect, we should expect the values within the mapping at keys 2 and 22 to be the same object:

$ durus -c
Durus 127.0.0.1:2972
    connection -> the Connection
    root       -> the root instance
>>> names = root['names']
>>> id(names[2])
3081790720
>>> id(names[22])
3081790720
>>> id(names[2]) == id(names[22])
True

Of late there seems to be plenty of interest in non-SQL database architectures -- CouchDB, Tokyo Cabinet among others getting attention, in part because they offer a language agnostic solution.

For those many other times when a project will benefit from a persistence layer tightly coupled with the language, object databases like Durus or ZODB are worthy of consideration.

December 03 2008

First Python 3 Web Application Framework?

From the QP mailing list, Wednesday December 3 2008:

Today the MEMS Exchange released updates of 5 software packages: Durus, QP, Qpy, Sancho, and Dulcinea.

You can find details and downloads at the usual location: http://www.mems-exchange.org/software/

These packages require Python 2.4 or higher, and yes, they even work with Python 3.0.

It does seem that perhaps QP 2.1 and friends is among the first if not actually the first web and database development packages available on Python 3.0 which was released today.

(Previously I've written about QP's templating system, Qpy, and a performance increase moving from Python 2.5 to 3.0.)

November 27 2007

QP and Durus on Nokia N800

The attached image is a screen shot of a web browser running on a Nokia N800, a Linux-based internet tablet. Weighing only ounces, the wireless device is a great platform for Python developers as the language has more or less become the default dynamic language for the device and, it would seem, for Nokia. GUI developers commonly employ pygtk/glade for Maemo applications; I'm not aware of much web development being done on the N series tablets as yet.

Pictured is the output of a highly Pythonic web framework and object database combination, QP and Durus - the app is merely a template provided by running mkqpapp.py.

I wanted to see what QP and this little tablet, underpowered by laptop or server standards, could do - here is a benchmark between a fairly fast Unix desktop, across a fairly slow wireless link, generating 10 concurrent request streams to the device

frog# /home/mw% siege -b -c10 -t10s http://n800:8000/
Transactions:                164 hits
Availability:             100.00 %
Elapsed time:               9.85 secs
Data transferred:           0.06 MB
Response time:              0.58 secs
Transaction rate:          16.65 trans/sec
Throughput:                 0.01 MB/sec
Concurrency:                9.71
Successful transactions:     164
Failed transactions:           0
Longest transaction:        3.66
Shortest transaction:       0.19

Not bad, considering its a full stack web framework and object database running on a little machine weighing only ounces that also is running what amounts to be a Gnome environment, browser, mail and other apps, all powered by a lowly TI 320MHz Armel architecture CPU.

Out of curiosity I added a hit counter to exercise the object database and the transaction rate was a respectable 12.36/second.

Worlds smallest portable web application demo machine!

Once I figure out how to make Debian packages for the armel architecture I'll post a deb link for a one click install of Durus, QP, QPY and Dulcinea.

Incidentally, while sqlite is in common use on Nokia tablets, there's clearly no reason why Durus could not be used. N-series Python developers might find that to be an ideal persistence pairing to go along with their web or GTK apps.

July 12 2007

Python Database Interfaces

Python object databases need some love too

Flávio Coelho recently performed an examination of various Python database API and ORM interfaces to MySQL, Postgres, and SQLite, and included a benchmark for cPickle.

Here's an addition to Flávio's Fastest Python Database Interface article and script to include Durus: performance.py.

I also pointed out on Flávio's blog that his cPickle benchmark needed to include pickling 100,000 "Person" classes, in addition to 100,000 simple tuples - this to show the overhead of class instantiation and serialization / deserialization which all of the ORM's and object databases share in some form or another. An example of both can be found in performance.py.

June 08 2007

Python Web Application Diary, Part Six

In part five of this series we dove deep into QP and looked at the fundamentals of any QP application - SitePublisher and SiteDirectory - as well as explored the use of QPY templating. We also built a rudimentary UI for our Entry object.

In this installment of our web application diary we'll work more with the Durus object database by injecting some data into it; exploring the interactive interpreter (one of the cool features of Durus to be sure) and starting the basis for a conversion script to take weblog data in PyBlosxom format and insert it into our blog application database.

Tip

Before going further, install Pyrepl - this is required to support QP / Durus interactive interpreter features, and adds significant functionality (optional) to Python's own interactive interpreter.

To see Pyrepl at work with regular Python launch:

pythoni

Durus, the database you already know

Now I know what you are thinking. I think. Well, that is my theory and it is mine and I own that theory. My theory is that you are thinking:

"Object database? what sort of weird and strange alchemy is that? Fear the unknown! Down with the unknown! Destroy the unknown with DELETE FROM queries!" -- you

While object databases are not exactly in commonplace use by the IT industry, within the Python community, there is a long history of kinship with object databases with ZODB, the Zope Object Database, arguably being the most well known example.

Durus is patterned after ZODB, and indeed was written by developers who had used ZODB extensively. Visit the Durus pages for more information on their rationale for reinventing this particular wheel; from my own experience I can only say that Durus is small and easy to read and understand.

What exactly is an object database? Put simply, Durus and ZODB allow you to persist your Python objects. Its more than pickle but not unlike pickle in some respects.

Tip

Launch a log viewer in another terminal window so you can watch what happens as we make changes to the Durus database. qp -l blog

Demonstrating Durus, Interactively

QP and Durus provide the facility to work directly with the Durus object database directly. Lets fire up an interactive session to show Durus basics.

% qp -i blog
Profile, connection, publisher, root, sessions, site, users
->>

Working within the interactive session: Pyrepl provides very useful search and command history capabilities. Control-P and Control-N step through previous lines entered. Control-R starts up reverse history search - start typing an entry you've made previously (searches substrings within) and Control-R again to step through the hits, if any.

Term expansion is perhaps my favorite Pyrepl enhancement - it certainly is the one that gets used enough. Try it now by entering in a couple letters:

->> pu

And press Tab - you'll be rewarded with either publisher or a list of terms in the namespace which match the letters entered so far. A real timesaver.

Access to objects: The interactive session provides us with access to QP objects (connection, site, publisher), application objects (sessions, users), a Profile testing class, but the most relevant to our discussion right now is root.

By convention our application data lives under root, which is itself a persistent object. Changes to root will persist from session to session provided a call to connection.commit() has been made to commit the changes to the database. Lets do some simple examples.

->> from durus.persistent_dict import PersistentDict
->> mydict = PersistentDict()
->> root['test'] = mydict
->> connection.commit()
->>
%

Control-D exits the interactive session, as it also exits a standard Python interpreter. Restart the interpreter to see if our object was 'saved' or persisted.

% qp -i blog
Profile, connection, publisher, root, sessions, site, test, users
->>

Very good, test, now shows up in our display -- objects living at the root level are conveniently displayed as a reminder when we fire up an interactive session. Lets put some data in test, but first, what was test?

->> test
<PersistentDict 17020>
->> test.items()
[]

Right, now I remember. Ok, add some data.

->> test[1] = 'My first persistent data'
->> connection.commit()
->>

Control-D to quit, and restart again to satisfy any fears that you may have about your important data.

% qp -i blog
Profile, connection, journals, publisher, root, sessions, site, test, users
->> test.items()
[(1, 'My first persistent data')]
->>

By now you can see that what we are doing is using Python to manage our data, and, by virtue of subclassing one of Durus persistent object classes, we can make our Python objects full partners in the Durus object database.

Durus is the database you already know. No object relational mappers to learn, no SQL to learn or work around.

Durus Mini FAQ

What about performance?
This is too difficult a question to answer simply, but its been my experience that I have been able to use Durus, instead of a SQL database (Postgres is my personal favorite among the open source databases), far more often than not. You won't put an on-line banking system processing millions of transactions a day on to Durus or ZODB; but you might base on Durus a complex company inventory system, even if there are hundreds of thousands of items and related history. Third party solutions marry Durus with relational databases as a back-end to Durus (transparent to the application) to extend Durus (ZODB has similar approaches I'm told) even further.
What about SQL / queries? How will I ever live?
One of the challenging things for a SQL-oriented developer (that was me, some time ago) is to start thinking in pure-Python again. Its not hard, but it does take some realignment of thought before it comes naturally - at least for me. Being able to dispense with relational thinking in the SQL sense brings a lot of design freedom.
What about sharing data with other systems?
My approach has been to export data as CSV or DIF for import into other systems SQL databases, or to provide APIs such as XML-RPC or REST / JSON approaches for other applications themselves, or to use RSS or Atom feeds when it makes sense.

The bottom line: Durus objects are Python objects. You've already invested in learning and knowing Python, so you already know Durus, so there is no time-to-learn downside to spending some time with Durus now. Lets press on.

Entries with no home

In part three of this series we turned a simple Entry object into a full partner of a Durus database merely by subclassing PersistentObject instead of the standard Python new-style class object. In part four we kicked things up a notch by fleshing out our Entry object with specifications provided by the QP module qp.lib.spec.

What we have not done, yet, is provide a place for our journal entries to 'live'. We need a container for Entry, and early on we decided to call that container Journal. We are really going to kick things up a notch by levering off of functionality provided by QP in qp.lib.keep. A Keep is a mapping of Keyed items using an integer as a key. Lets enhance Entry first, then we'll write some unit tests for Journal, and then write Journal itself.

All the code for the end-result objects will be available at the conclusion of this series, but for you folks following along at home, lets dive in and re-edit our journal.py and clean up our Entry object first. For brevity's sake I have included imports relevant to both Entry and the Journal object we will be writing.

from dulcinea.base import DulcineaPersistent
from dulcinea.sort import attr_sort
from qp.lib.keep import Keep, Keyed, Stamped
from qp.lib.spec import add_getters_and_setters, boolean, both, datetime_with_tz
from qp.lib.spec import init, pattern, string, spec
from qp.pub.user import User


class Entry(DulcineaPersistent, Keyed, Stamped):
    """
    An entry in a journal.
    """
    title_is = spec(
        (string, None),
        "A string briefly describing the Entry")
    text_is = spec(
        (string, None),
        "The entry conten")
    published_is = spec(
        boolean,
        "Boolean indicating if Entry can be published")
    author_is = spec(
        User,
        "User responsible for creating entry")
    created_is = datetime_with_tz

    def __init__(self, author):
        Keyed.__init__(self)
        Stamped.__init__(self)
        init(self, author=author, created=self.stamp, published=False)

add_getters_and_setters(Entry)

Lets now write Journal but before we write it, lets write the tests we want it to pass, first, and then write the object. Typically you might write only some of these tests, at least until you become familiar with the various features of the QP and Dulcinea libraries. In our ./test/utest_journal.py we'll add another test.

from parlez.journal import Journal

class JournalTest(UTest):
    # we'll write this first, and then write Journal

    def _pre(self):
        # set up a journal which we'll use for most tests.
        self.j = Journal('science', User('einstein'))
        # it is automatically taken down following each individual test

    def init_test(self):
        # we want Journal to have a URL name and an owner, so force it
        Journal('musings', User('joe'))

    def create_entry_test(self):
        assert isinstance(self.j.create_entry(), Entry)

    def add_test(self):
        e = self.j.create_entry()
        self.j.add(e)
        assert e in self.j.get_all_entries()
        assert e == self.j.get_entry(1)

    def only_published_test(self):
        # nothing in
        assert self.j.get_all_entries() == []
        e = self.j.create_entry()
        self.j.add(e)
        e_published = self.j.create_entry()
        e_published.set_published(True)
        self.j.add(e_published)
        assert e not in self.j.get_entries()
        assert e_published in self.j.get_entries()
        # publish e now
        e.set_published(True)
        assert e in self.j.get_entries()
        # both should be in reverse sorted result, e last
        assert [e_published, e] == self.j.get_recent_entries()

if __name__ == '__main__':
    EntryTest()
    JournalTest()

I've kept this briefer than I'd like it to be, as there are some other tests we need to write to completely cover our Journal object, but these tests of primary functionality - add, retrieve, retrieve all and sort - should give you the spirit of what we are trying to achieve here.

PyBlosxom to Journal Conversion

A common challenge: you've got data in one system and need to move it into a Durus database. A script to perform this task will be included in full at the end of this series. For now lets sketch out what we need to do, and look at how to access an application's Durus database from a script.

Pyblosxom maintains its files in a hierarchy that looks like something like this:

../entries/categoryname/file1.txt
../entries/categoryname/someotherfile.rst
../entries/python/2007-06-08-08-44.rst

And so on. My particular installation uses a plugin which parses the entry date from the file name if it is formatted as a datetime in the form of yyyy-mm-dd-hh-mm.ext, so for files formatted like that I can set Entry.created to a datetime parsed from the filename. Otherwise, I need to stat the file and get its creation date from the operating system, which isn't always reliable (in the case of edits and hapless administrators).

The file contents are simple for me to parse - content is either plain text, or in my instance, mostly Textile formatted with a sprinkling of reST and Markdown.:

Some article title
#author Mike Watkins
The article content.

.h2 A subtitle

More content. Etc.

I never used the #author directive; some files use the #parser directive to indicate which formatter should be used; most rely on file extensions (.rst, .txt, .mkd).

Ultimately my script needs to deliver to me:

  • Entry date
  • Format
  • Title
  • Content

And, if I intend to preserve the URLs (am debating this now... I really dislike the existing bloxsom / Pyblosxom URL design) I'll need to carry that information forward too. For now, lets assume we have a mapping containing file paths as keys and a list with the four above noted data elements to work with, and write a script to import that information into Durus.

Importing data to Durus

Working with a QP application's Durus database is easy - remember, its just Python.

from qp.lib.site import Site
from parlez.journal import Entry, Journal

def bloxsom_to_mapping(entrypath):
    # here you'll deal with the specifics - see a future article
    data = {}
    # ...
    return data

def add_journal_entries(data, journal):
    for path, entry_data in data:
        # path I might store, or some component of it, in the Entry
        # object to facilitate mapping old to new URLs in the future.
        # for now, just ignoring it
        created, format, title, content = entry_data
        entry = journal.create_entry()
        entry.set_format(format)
        entry.set_title(title)
        entry.set_text(content)
        # normally we don't bypass getters/setters
        entry.created = created
        entry.stamp = created
        journal.add_entry(entry)

if __name__ == '__main__':
    BLOXSOM_ENTRY_PATH = '/home/mw/bloxsom/entries'
    APP_NAME = 'blog'
    JOURNAL_NAME = 'mw'
    USER_ID = 'mw'

    # the Site object gives us the ability to access
    # configuration information and live objects
    site = Site(APP_NAME)
    pub = site.get_publisher()
    root = pub.get_root()
    users = root['users']
    # make sure I exist in Users
    if USER_ID not in users:
       user = pub.create_user(USER_ID)
       users.add(user)
    if 'journal' not in root:
        journal = Journal(JOURNAL_NAME, user)
        root['journal'] = journal
    # move bloxsom data into Entry/Journal
    add_journal_entries(bloxsom_to_mapping(BLOXSOM_ENTRY_PATH),
                        journal)
    # made it here, commit everything to the database
    pub.get_connection().commit()
    # that's it!

Next Installment

When we return in part seven of this series we will further flesh out our UI objects for Entry and Journal, adding methods for creating and editing objects. At that point we'll have a basic journal or weblog application ready to deploy to the world. Subsequent articles will add more functionality.