mike watkins dot ca : February 26 2005 Archives

February 26 2005

Object DB vs Relational DB

My oh my, I’m treading on religious war and flamefest territory lately, although I have no intent to raise hackles. Ryan observed

There are a whole range of situations for which these issues aren’t really issues at all and relational persistence has its own set of issues. But in my experience, making the decision between ZODB and relational has been less about what’s better for the code and more about what’s better for the data.

Absolutely.

There are a great many situations where data might be better off in a SQL db, particularly when integration with other applications not of our making, or other tools, is required. This isn’t so much an argument against Object Databases as it is a comment on the ubiquity of SQL (and tools which support it) in the enterprise.

On the other hand, if I don’t have to play nice with other apps, and if the app doesn’t call for a relational db approach, I sure enjoy designing objects without having to think about SQL. Maybe its just a nice change of pace for me. If anything, being forced to think about what I do miss about not having SQL at my right hand is probably influencing my object design for the better.

Regarding Ryan’s comment ”let’s put everything in ZODB”, I know he didn’t mean it in this context, but I would never put some things in ZODB or Durus, including:

  • Big lumps of content, especially relatively static content
  • Big blobby items like images and many binary formats. That’s what file systems are for!
  • My children

Taking Textile Out

The Plot Thickens – following up on my response to Ryan, I ran a series of cheap-thrills performance tests using apache bench on the Quixote app and Instiki, with display-time Textile reformatting both enabled and disabled.

To disable Textile I modified app/models/chunk/engines.rb to return
content rather than
RedCloth.new(text,content.options[:engine_opts]).to_html.


Scenario                Requests per second
-------------------------------------------
                          Textile Engine
                        Disabled   Enabled
                        ------------------
Instiki, no content      21.05       17.75       
Quixote, no content     250.63      149.37


Instiki, content          9.66        9.57
Quixote, content        248.14       18.67




I wasn’t able to get Instiki to churn out more than 22 pages per second across a number of test runs. The machine I’m running this on is a 2.4GHz, 1MB RAM 7400 RPM EIDE drives that also happens to be running X and gosh knows what else, but at least its running all this at the same time for both sets of tests.

I’m not surprised to see the big hit from Textile coming through in the Quixote numbers. A while back I did some raw text processing benchmarks comparing Textile, Markdown and reStructuredText and discovered that if Markdown was suitable for the task, it could be rendered at access time and a site might still survive a slashdotting. For example, using the same content and scenario as above, except modifying the content slightly to use Markdown ‘markup’:


                          Markdown Engine
                        ------------------
Quixote, no content                 215.98
Quixote, content                    106.60      




Pretty useful! Of course Markdown isn’t for everyone or every purpose but if one faces a toss up over which to use, Markdown’s simplicity leading to raw output performance may win hearts.

But Textile vs Markdown was not what this song is about, its about Alice. Remember Alice? Oops… I mean, the purpose of digging out ab was for a quick comparison of frameworks which failed to account for rendering and then led to one thing and then another and now accounts for formatting and… still… I am puzzled.

I’m puzzled as to the stark difference in raw performance between Ruby/Instiki and Python/Quixote when taking Textile out of the picture. Either I’ve missed something in Ruby/Instiki (entirely possible, although the content certainly is not being rendered formatted!) or there is some deep juju at work, or not at work as the case may be.

Ryan's Questions

Ryan asks a question about the test situation used to come up with the numbers I published this week. I’m glad someone poked at this issue to give me the motivation to dig deeper, as I am also keen to avoid a flame-fest.

(Update: Do also check out Taking Textile Out for more on the comparison)

So I dug deeper, and to make a long story short, here is what I’ve found: performance is much closer on an apples to apples comparison basis than my first blush at midnight look suggested, however, there are still some outstanding questions which suggest looking at performance more closely would be time well spent if someone is making a toolkit and language decision.

Quixote still appears to edge ahead, and here’s the scoop:

What was wrong

I took a quick look into Instiki’s code and from my unpracticed in Ruby eye, it appears that content is stored in the native markup format and rendered at display time – in this case, the default format is Textile.

My application does the same, but stores a pre-rendered (optionally) version of the content which is served up. I’d left this optional feature on, by default, and did not factor the performance of rendering Textile or Markdown formatted content into HTML in my last look at this.

Naturally this makes a big difference in performance as you might expect, dropping from 240 some odd requests per second to 25 to 175 requests per second in my application depending on the scenario – more on this later.

The original intent for my ab exploits was to perform a quick benchmark on my solution, not throw any stones elsewhere. Mea culpa for drawing inappropriate conclusions, and thanks Ryan for giving me the motivation to dig deeper, although as you’ll find out, there are still unanswered questions even after having dug deeper.

What else have I learned?

Instiki isn’t using Rails per se. At the time I did the test I was not aware that the current version of Instiki does not use Rails per se, rather, it seems that in writing instiki, the Rails web application framework was inspired and undoubtedly refactored into a general purpose application framework.

FYI the group supporting Instiki seems to be moving current parts of Rails back into the code with a goal to reaching a version 2 where Instiki is a model web app for Rails itself.

Updated performance comparison. Here’s the environment and application scenario:

  • Ruby/ Instiki versions: Ruby 1.8.2, recently built; Instiki 0.9.2
  • Python / Quixote versions: 2.4 / Quixote 2.0a4

Application – Instiki:

  • Created a new Instiki instance
  • Backend is a file system; server is Ruby’s WEBrick
  • Ran Instiki as a daemon, the default; checking the command line options using—help I did not see any option available which speaks to performanc tweaks
  • Loaded content into two wiki pages. The first contains Textile formatted content; the second contains only the html entity for non-breakable space.
  • The path to the page, being a wiki, is simple and flat; url looks like http://localhost:2500/wiki/show/SomePage
  • Test – 100 retrievals: ab -n100 http://localhost:2500/wiki/show/SomePage

Application – Quixote-based content management solution with a Wiki component

  • Backend is a Durus object database accessed over the network but app and Durus are on the same machine for this test.
  • Web server is a pure Python server
  • Same content and scenario as the corresponding Instiki pages
  • Disabled pre-rendered storage
  • The url path is somewhat more complex, having to do more than one lookup to get to the end point object, as my solution provides for multiple wikis. I could map mywiki such that it got called directly off the root of the application but for comparisons sake it would not make a big difference. URL http://localhost:8080/wikis/mywiki/SomePage

Result, averaged over three runs:

  • Instiki/Ruby/Rails pre-cursor:
    • With content:9.5 pages / second;
    • Without content: 18 pages / second
  • Quixote/Python/Dulcinea + Durus based app:
    • With content:54 pages / second;
    • Without content: 147 pages / second

Still a substantial difference in performance. Perhaps I’ll dig through Instiki’s code some more and cut out the Textile rendering to get a better comparison without that variable in the mix. And I will speak to Durus vs SQL at a later date. I’m actually an old (older than I wished!) SQL hack and have no misconceptions about either.