rawtech

Wednesday, 08 February

12:54

We released Stoq 1.2 last week, this release features quite a bit of features:

Calendar application

It’s now possible to list payments, purchase orders and client calls in a graphical view:

 

It might look familiar, it uses the fantastic javascript library fullcalendar. We really wanted to use a normal GtkWidget for the calendar but it would have been a lot more work to rip out half of evolution. If there are any other options that can match fullcalendars functionallity there we’d be open to switching as embedding WebKit, jQuery and fullcalendar in a Gtk+ application is not ideal.

Configurable keyboard shortcuts

This is something that has been requested many times over the years. It makes it easier to remap the keyboard bindings use often to other keys, such as the function keys. There’s still an open task to redo all the existing keybindings that aren’t uniform enough.

Configurable form fields

Some companies does not use all the form fields (fax anyone?) that we show per default and Stoq know has a configuration interface where you can make fields non-mandatory and even hide them if you don’t wish to see them. Perfect for the first steps of localization.

New manual

One of our interns rewrote old docbook manual to mallard, and it looks beatiful and is now well integrated in the application. You can find the online version here. It involved removing a lot of screenshots and text. It’ll be easier to update the manual in the future if there aren’t any screenshots. He also fixed the interface, there are now various help buttons in the application that goes to a help section describing that part.

Localization support

It’s now possible to configure some of the fields that are specific to each region/country. The only thing that made it into this release was company identification number (Brazil: CNPJ, Sweden: Organisationnr, US: Employer Identification Number). But person identification number and list of states has landed in the code repository since the release. We still need someone to step up and start doing the actual localization for this, be the hero of the day and download Stoq and start localizing it!

Boleto Bancário (Bank invoice)

Brazilian banks supports a kind of invoice with a barcodes/numbers, called boleto bancário. It’s semi-standardadized, most of the data is similar, but you need to special case each bank that should be supported. There are two kinds, with and without cobrança (for eventually sending to a collection agency). There are a couple of 100 active banks and about 15 major ones. Stoq currently supports 7: Banco do Brasil, Banco Real, Banco Santander, Banco Bradesco, Caixa Econômica, Banrisul, Banco Itaú. All without cobrança though, support for that will come in a future release.

Call for volunteers

Stoq has initially been targeting the Brazilian market, since that what’s close to the current development team. But there is now longer an excuse for not trying to use it. We can barely handle the legal part of Brazil and we’d need volunteer help to make it possible to use in other countries. We’re very proud of the application so we wouldn’t want to stop you just because you live outside of Brazil!

So, why don’t you grab the code and get started, it’s all python (and a tiny bit of javascript) and shouldn’t be hard to get started.

Don’t be discouraged by the web site and manual is only in Portuguese, we use gettext and rosetta and the code is modular and easy to understand.

We’ll need a lot of work to support localization in different countries such as: company/person formats, states, taxes and other things we don’t know about yet, let us know and we’ll try to find a solution.

Just send me a mail or come in on our new shiny web chat: http://chat.stoq.com.br/ (aka #stoq on freenode)

PyCon 2012 will be the biggest PyCon yet. Amazing talks, tutorials, posters - robots - we are going to have it all for you. The volunteer team is working on welcoming committees, social events and many other things.

Each year there are quite a few new people, and with record attendance, we expect this year to be no different. So we thought that it this point it might be good to lay out the virtual welcome mat for everyone coming to PyCon and point out a few of the ways to make your PyCon unforgettable.

If we could point to just one thing that makes PyCon different, it is that at PyCon you come to contribute. If you want to have an extraordinary time and make PyCon your favorite conference all year, pick three of the items below, get involved, and contribute! Want to volunteer? Please sign up to pycon-organizers.

Stuff a Bag: For those who haven’t been to PyCon before, one of the most fun events takes place Wednesday evening.  Stand shoulder to shoulder with fifty or one hundred of your fellow Pythonistas to help stuff the attendee bags. Want to know who has the best swag? Want to see what people will be giving away in the Expo Hall? Want to just have fun? Come stuff bags.

Chair a Session: PyCon talks are arranged in groups of two or three, called sessions. (Look at the schedule to see what I mean). Session chairs help run the session, introduce speakers, call time, and help run the room for a short period of time. If you want to be in the front row at one particular talk, sign up to be session chair! There will be a sign-up board at PyCon.

Run a Race: Many Pythonistas are active runners. More are probably waiting for a kick in the pants to get up, get out the door, and start running. Well, here's your chance! Whether its part of your regular training, a New Years resolution, or whatever -- we hope you'll join us for the inaugural PyCon 5k.

Get a Job: A short while ago you may have seen a similar announcement for an online job board for our sponsors with open positions, located at https://us.pycon.org/2012/sponsors/jobs/. Sponsors have enjoyed this benefit and we think the community has as well. However, we’re taking this job fair one step further: into real life. On Sunday March 11 from 10:00 to 12:00, the expo hall will be running a job fair for all sponsors seeking to hire Python developers.

This job fair will run concurrent to the always excellent Poster Session, and will occur during the morning snack break. Grab a drink and a cookie and mingle with this year’s list of incredible sponsors, from small startups to big corporations, from the east coast to the west coast, local workers to telecommuters -- there’s a lot of organizations to choose from. With 122 sponsors on board, we think you’d have trouble not finding a company that interests you.

Give a talk: One of PyCon’s traditions - one that we aren’t ashamed to admit that we picked up from the Perl community - is having Lightning Talks. Lightning Talks are five-minute, rapid-fire talks about something that interests you. Maybe you've never given a talk before, and you'd like to start small. For a Lightning Talk, you don't need to make slides, and if you do decide to make slides, you only need to make three. Sign up quickly, though - spots go fast.

Check out the Hallway Track: Many of the PyCon old-timers are most fond of the “hallway track” - the spontaneous meetings and discussions that occur when you bring together interesting, intelligent people (like all PyCon attendees!). There have been projects and businesses launched, friendships made, and problems solved in the hallways at PyCon.

Organize an Open Space: PyCon sets aside rooms for “Open Space” discussions and meetings. Anyone can lead an open space - just sign up for the room and the time slot and it is yours. Do you play an instrument? Each night at PyCon usually has a music jam open space. Want to work on a quick idea with someone? Follow up on a talk? Plan to take over the world? Open space.

Attend a BoF: Some of our open spaces have grown up into semi-regular Birds of a Feather (BoF) sessions. The best-known is probably the Testing in Python (TiP) BoF, but we usually also have Board Game BoFs, Science BoFs, Whiskey BoFs, Newbie BoFs, “Teach me” BoFs, and many more.

Sprint: If you are still making your traveling plans, one of the best ways to take advantage of PyCon is attending the sprints. Development sprints are a key part of PyCon, a chance for the contributors to open-source projects to get together face-to-face for up to four days of intensive learning, development and camaraderie. Newbies sit with gurus, go out for lunch and dinner together, and have a great time while advancing their project. Have you ever wanted to hack on Python-core? Twisted? Django? SciPy? The leaders of each project will be there during the sprints, and you will be able to contribute in a meaningful way.

Sponsor PyCon: Ok, we had to say it. There are over 120 companies sponsoring PyCon, the most yet. We have filled up the Expo Hall, but you can still show your support (and participate in the Job Fair) with your sponsorship. If you are still considering sponsoring PyCon - now is the time to reach out to us - jnoller@python.org!

Come contribute to PyCon. It will be your favorite conference all year.

Pugs

12:18

Update 2/8/2012: Fixed the code sample (some HTML markup had gotten filtered out by my blog editor).

Plone does great at in-place editing: navigate to the thing you want to edit, then click the button and edit it. However, this paradigm breaks apart as soon as there is a need for a page to have multiple editable areas—such as for a homepage or section landing page.

At Groundwire, we used to deal with this problem by creating an ongoing series of very similar hacky one-off templates: the sort of template that would have have several areas which each pulled in the content from some item in a hidden folder of page components. Unfortunately this approach did not scale very well: it was tedious for us to set up new templates, and it was cumbersome for editors to remember how everything was set up in order to successfully make changes.

Last year we worked on the Net Impact website which has a different multi-part layout for each section landing page, and we realized that we needed to come up with a better solution. The requirements:

  • Someone writing a template should be able to define an editable area of that template very simply, by just adding a line to the template that specifies the name of the area.
  • There should be support for different types of editable areas; each type may have different settings when editing the area.
  • Editing an area should be triggered by a pencil icon that shows up while hovering over the area for users who have permission to edit the area,
  • All this should be done in a way that is simple to reuse for new sites.

Tiles to the rescue

We realized right away that our requirements were very similar to the functionality provided by the Deco project's implementation of "tiles." Deco is an ambitious project to provide drag-and-drop layout capabilities within Plone. Deco as a whole was not mature enough for us to feel comfortable using it, but I knew that the tile rendering was one of the older and more mature parts of Deco, and we realized that it would not take a lot of effort to use tile rendering without the rest of Deco.

A tile is a snippet that can be inserted into a template as a div with a data-tile attribute, like this:

<div data-tile="/Plone/@@mytile" />

Then some machinery in the publisher provided by plone.app.blocks performs the following steps:

  1. It finds all the divs with a data-tile attribute (let's call them tile placeholders).
  2. For each one, it performs a subrequest to fetch the contents of the tile. Using a URI makes tiles very flexible: a tile could be a browser view, or it could come from some external system.
  3. The tile placeholder is replaced with the contents of the tile's body tag. If the tile has a head tag, its contents will be appended to the head of the page that includes the tile.

That's a great start! As it turns out, other parts of the tiles implementation also help support our use case:

  • plone.tiles has a tile implementation which supports having multiple tile types. A tile turns out to basically be a browser view that also happens to have some associated data. (This is a lot like a portlet renderer, but one that can be added anywhere with a line in a template rather than needing to mess around with portlet managers.) Each type of tile can specify a different schema for its data, and that data can be persisted in different ways.
  • plone.app.tiles provides an edit form that takes care of editing the data for a particular instance of a tile.

In practice: adding a rich text tile

So let's see how this plays out in practice. We are going to:

  1. Set up the basic tile rendering machinery.
  2. Implement a rich text tile that can be added anywhere, and that stores its contents in an annotation of the context where it is added.
  3. Make sure that editors see a pencil icon that brings up an modal overlay to edit the tile.

The basics

Okay, let's get the basics set up.

  1. Create a package that declares dependencies on: lxml, plone.app.blocks, plone.app.textfield, plone.app.tiles, and plone.tiles.
  2. At the time of this writing, you need trunk checkouts of plone.app.blocks, plone.app.tiles, and plone.tiles.
  3. Make sure that your configure.zcml includes <includeDependencies package="."/>.
  4. Make sure the metadata.xml in your package's GenericSetup profile runs the default profiles from plone.app.blocks and plone.app.tiles as dependencies.
  5. Install your package.

The tile

Add a tile.py with the following:

from zope.interface import Interface

from plone import tiles
from zope.schema import Text
from plone.app.textfield import RichText
from plone.app.textfield.interfaces import ITransformer


class IRichTextTileData(Interface):

    text = RichText(title=u'Text')


class RichTextTile(tiles.PersistentTile):

    def __call__(self):
        text = ''
        if self.data['text']:
            transformer = ITransformer(self.context, None)
            if transformer is not None:
                text = transformer(self.data['text'], 'text/x-html-safe')
        return '<html><body>%s</body></html>' % text

In configure.zcml, add:

  <plone:tile
      name="groundwire.tiles.richtext"
      title="Groundwire rich text tile"
      description="A tile containing rich text"
      add_permission="cmf.ModifyPortalContent"
      schema=".tile.IRichTextTileData"
      class=".tile.RichTextTile"
      permission="zope2.View"
      for="*"
      />

This defines a new tile type, called groundwire.tiles.richtext. This tile type has a schema with a single rich text field, and when it is rendered the tile will run the configured text through the safe HTML transform to make sure it is safe.

Wiring in the edit form

Now we just need to make sure that editors will have a way to access the edit interface for tiles.

Add the following javascript. Make sure you put a condition on it like "python:object.portal_membership.checkPermission('Modify portal content', object)" so that it will only run and add the edit links for users who have permission to edit.

jQuery(function($) {
  $('div[data-tile]').each(function() {
      $(this).addClass('tile-editable');
      var href = $(this).attr('data-tile');
      var edithref = href.replace(/@@/, '@@edit-tile/');
      $('<a class="tile-edit-link" href="' + edithref + '"><img height="16" src="pencil_icon.png" width="16" />')
        .appendTo($(this))
        .prepOverlay({
            subtype: 'iframe',
            config: {
                onClose: function() { location.reload(); }
            }
        });
  });
  
  // Check if tiledata is available and valid
  if (typeof(tiledata) !== 'undefined') {

      // Check action
      if (tiledata.action === 'cancel' || tiledata.action === 'save') {
          // Close dialog
          window.parent.jQuery('.link-overlay').each(function() {
              try {
                  window.parent.jQuery(this).overlay({api: true}).close();
              } catch(e) { }
          });
      }
  }
  
});

This adds an edit link to all the divs that have data-tile attributes. It also handles the "tiledata" which is how the plone.app.tiles edit form controls when the overlay it appears in should close.

And finally we need a bit of CSS to style the tiles and edit links:

.tile-editable {
    position: relative;
    outline: 2px dashed #e8e8e8;
    min-height: 1.5em;
}

.tile-editable:hover {
    outline: 2px dashed #b8b8b8;
}

.tile-edit-link {
    display: none !important;
    position: absolute;
    right: 1px;
    bottom: 1px;
    z-index: 500;
}

.tile-editable:hover .tile-edit-link {
    display: block !important;
}

Adding a tile

Okay, now let's add one of these to a template. Pick your favorite template and add:

<div tal:attributes="data-tile string:${context/absolute_url}/@@groundwire.tiles.richtext/hello-world" />

Here's what it looks like in my instance (I added it to the document_view template):

A tile

And here's the editing interface that shows up when I click on the pencil:

Tile editing

(If you want to see how this code all comes together, look at the code in https://groundwire.devguard.com/svn/public/groundwire.tiles/branches/davisagli-blocks)

In conclusion

We are very happy with the way the tile approach turned out for the Net Impact site. Once we had mastered the basic technique for landing pages, we soon realized that tiles provided a useful way to add user-editable content areas anywhere in the site. Site needs a doormat in the footer? Use a tile with the Plone site as its context so it appears the same throughout the site and the client can edit the links. Client wants a block of text they can edit on the login form to promote registering for the site? No problem, just add a tile. Client is repeatedly asking for minor edits to the text introducing a custom form? No problem, we turned it into a tile and told them how to edit it. Since the presentation of tiles in the UI is simple and consistent, the barrier to entry for the client to learn how to edit a new tile was very low.

The approach as described here isn't perfect. One thing that needs some care is cache invalidation. In our case, we wrote an ObjectModified event handler for tiles that updates the modified time of the page on which the tile appears. Another limitation is that text in tiles won't be included in the fulltext index unless you go to extra lengths. Whether that's a feature or a bug depends on your use case.

Overall though, we love the technique and have also started using tiles in other sites. I know that Six Feet Up has also successfully used tiles with at least one client. If you want to expand your Plone layout repertoire without using experimental technology like Deco or removing control over content from your clients, I encourage you to give it a try!

11:09

Laurence Tratt from King's College London has written a long and detailed introduction to the goals and significance of RPython over on his blog. Laurie has been implementing his Converge Language in RPython in the last months. He is one of the first people external to the PyPy team who have pushed a sizeable RPython-based VM quite far, adding and tuning JIT hints. The post describes some of that work and his impressions of RPython and PyPy.

"RPython, to my mind, is an astonishing project. It has, almost single-handedly, opened up an entirely new approach to VM implementation. As my experience shows, creating a decent RPython VM is not a huge amount of work (despite some frustrations). In short: never again do new languages need come with unusably slow VMs. That the the PyPy / RPython team have shown that these ideas scale up to a fast implementation of a large, real-world language (Python) is another feather in their cap."

08:09

I'm currently working on a project which centres around pulling in data from an external website, "mashing" it up with some additional content, and then displaying it on a website.

The website is going to be interactive and reasonably complex so I decided to use django. To acquire the external data there isn't a webservice so I'm stuck parsing html (and excel spreadsheets but that's a separate story). Scrapy seemed ideal for this and although I wish I had used some other approach than xpath it largely has been.

Having set up my database models in django and built my spider in scrapy the next step was putting the data from the spider in the database. There are plenty of posts detailing how to use the django ORM from outside a django project, even some specific to scrapy but they didn't seem to be working for me.

The issue was the way I handled development and production environment settings.

Read more . . .

The Utah Python will be meeting on Thursday, Feb 9th at 7pm. Amji will be doing a short presentation on a memoization decorator and Eric will be giving a preview of his PyCon talk “Interfaces and Python”. Cheers.

People are invariably surprised when they hear it’s hardly ever necessary to invert a matrix. It’s very often necessary solve linear systems of the form Ax = b, but in practice you almost never do this by inverting A. This post will give an example of avoiding matrix inversion. I will explain how the Newton-Conjugate Gradient method works, implemented in SciPy by the function fmin_ncg.

If a matrix A is large and sparse, it may be possible to solve Ax = b but impossible to even store the matrix A-1 because there isn’t enough memory to hold it. Sometimes it’s sufficient to be able to form matrix-vector products Ax. Notice that this doesn’t mean you have to store the matrix A; you have to produce the product Ax as if you had stored the matrix A and multiplied it by x.

Very often there are physical reasons why the matrix A is sparse, i.e. most of its entries are zero and there is an exploitable pattern to the non-zero entries. There may be plenty of memory to store the non-zero elements of A, even though there would not be enough memory to store the entire matrix. Also, it may be possible to compute Ax much faster than it would be if you were to march along the full matrix, multiplying and adding a lot of zeros.

Iterative methods of solving Ax = b, such as the conjugate gradient method, create a sequence of approximations that converge (in theory) to the exact solution. These methods require forming products Ax and updating x as a result. These methods might be very useful for a couple reasons.

  1. You only have to form products of a sparse matrix and a vector.
  2. If don’t need a very accurate solution, you may be able to stop very early.

In Newton’s optimization method, you have to solve a linear system in order to find a search direction. In practice this system is often large and sparse. The ultimate goal of Newton’s method is to minimize a function, not to find perfect search directions. So you can save time by finding only approximately solutions to the problem of finding search directions. Maybe an exact solution would in theory take 100,000 iterations, but you can stop after only 10 iterations! This is the idea behind the Newton-Conjugate Gradient optimization method.

The function scipy.optimize.fmin_ncg can take as an argument a function fhess that computes the Hessian matrix H of the objective function. But more importantly, it lets you provide instead a function fhess_p that computes the product of the H with a vector. You don’t have to supply the actual Hessian matrix because the fmin_ncg method doesn’t need it. It only needs a way to compute matrix-vector products Hx to find approximate Newton search directions.

For more information, see the SciPy documentation for fmin_ncg.

02:30

Since the last WHATWG Weekly, almost a month ago now, over a hundred changes have been committed to the HTML standard. This is the WHATWG Weekly and it will cover those changes so you don’t have to. Also, remember kids, that fancy email regular expression is non-normative.

translate attribute

To aid translators and automated translation HTML sports a translate since revision 6971. By default everything can be translated. You can override that by setting the translate attribute to the "no" value. This can be used for names, computer code, expressions that only make sense in a given language, etc.

Selector and CSS related changes

In revision 6888 the :valid and :invalid pseudo-classes were made applicable to the form element. This way you can determine whether all controls in a given form are correctly filled in.

Revision 6898 made the wbr element less magical. Well, it defined the element fully in terms of CSS rather than using prose.

A new CSS feature was introduced in revision 6935. The @global at-rule allows for selectors to “escape” scoped stylesheets as it were, by letting them apply to the whole document. It will likely be moved out of HTML and into a CSS once a suitable location has been found.

APIs; teehee!

It turns out that clearTimeout() and clearInterval() can be used interchangeably. Revision 6949 makes sure that new implementors make it work that way too.

Per a request from Adrian Bateman revision 6957 added a fourth argument to the window.onerror callback, providing scripts with the script error column position.

Speaking of scripts, in revision 6964 script elements gained two new events. beforescriptexecute which is dispatched before the script executes and can be cancelled to prevent execution altogether. And afterscriptexecute for when script execution has completed.

Revision 6966 implemented a change that allows browsers to not execute alert(), showModalDialog(), and friends during pagehide, beforeunload, and unload events. This can improve the end user experience.

IF YOU’VE spent any time building responsive websites with fluid grids, you will have encountered the shock of seeing your beautiful portrait layout distort when viewed in landscape mode (or vice-versa.)

This happens because whilst the layout and embedded content (images, video etc) are sized in relation to the pixel width of the viewport, the typography is not. And whilst it isn’t too difficult to design with enough affordance for the variation caused by the iPad’s 4:3 aspect ratio – most (if not all) Android tablets have 16:9 displays. These screens make the orientation difference even more pronounced.

Responsive News – Fluid grids, orientation & resolution independence.

FacebookGoogle BookmarksPosterousTumblrTwitterRSSLinkedIn

Tuesday, 07 February

17:09

So I've been playing with cutting down OpenGLContext into something like a modern scenegraph engine.  The first step there is to eliminate the old tree-traversal rendering mechanism, as the "flat" rendering engine is both simpler and much more easily optimized.  No big problem, really.  A lot of OpenGLContext's demos/tests just ran with only minimal changes, a few of the very old ones were using the customization points (e.g. Background) that were dependent on the old rendering model, but they could generally be ported to the new model by moving a few lines of code into their "Render" method.  The surprising corner case was one of the most recent tutorials, namely shadows.  As I was writing that tutorial I took a shortcut by using the legacy visitor in the middle of the rendering process to traverse and output the geometry for each sub-pass, and the modifications to the flat renderer mean that the "Context" is now a rendering node... which means the flat renderer includes it in the set of things to render when I do a query on the scenegraph for what should be rendered... queue infinite recursion.  Oh well, gives me something to work on tomorrow :-) .

Sentry really ought to use UDP, not TCP, because you don't want logging functionality to stall or even slow down your main application. At the moment, it doesn't support that, although there have been some promising commits.

For my usage (a web application), this means that you can really only use Sentry for logging exceptions, and not for anything less important.

However, there are some alternatives to UDP that make Sentry usable for more than exceptions. You could use a queue process like Celery or RabbitMQ (apparently what they use at Disqus).

A more light weight alternative, however, is an asynchronous client that does its work in the background, and so doesn't block your web server thread.

There is some hopeful looking code in raven.contrib.async, but unfortunately it currently has a critical bug (Raven 1.4.2).

However, using that code I cobbled together my own, and this one subclasses DjangoClient, which is what I need:

from raven.contrib.django import DjangoClient
from raven.contrib.async import AsyncWorker


class AsyncDjangoClient(DjangoClient):
    """
    This client uses a single background thread to dispatch errors.
    """
    def __init__(self, *args, **kwargs):
        self.worker = AsyncWorker()
        super(AsyncDjangoClient, self).__init__(*args, **kwargs)

    def send_sync(self, **kwargs):
        super(AsyncDjangoClient, self).send(**kwargs)

    def send(self, **kwargs):
        self.worker.queue.put_nowait((self.send_sync, kwargs))

Then you need to set SENTRY_CLIENT in your settings to point to this class.

(If you're not using Django, you should be able to do something similar.)

This is working fine for me - I can now enable the Sentry 404 middleware and not see any slowdown on my app, as opposed to the synchronous client which was slowing down 404 responses massively because my Sentry server is not on the same box as my main web app.

I should say this is use at own risk - the AsyncClient in Raven is undocumented as well as broken, so I don't know if it is considered a sensible approach or not!

Need a Python programming language book? Want to see a comparison of the ones I own and use? Check out my Must-Have Python Programming Books comparison grid.


Let's drill down and take a closer look at one of the items on the page, in this case Doug Hellmann's amazing The Python Standard Library by Example. The product detail pages include the ability to add pros and cons and attach said products to comparison grids and specialized lists like 'my wishlist' and 'my possessions'.

Speaking of wishlists, check out my own:


In order to add items, like footy pajamas,  I click on the 'add' button and paste the Amazon (or BestBuy) URL into the form:



At this time we just handle Amazon USA and BestBuy USA. In the future we plan on adding more affiliate providers, including non-USA providers to support our non-USA friends.

There's a lot more than that...

In addition to weekly infographics, comparison grids, lists, and products, Consumer Notebook also awards points, coins, badges, and a growing privilege set to participating users. We even implemented an energy bar which regenerates over time, designed to match the pace of human users and serve as one of the brakes on scripts and bots.


Technology

I built this with Audrey Roy using Python, Django, JQuery, PostGreSQL, Memcached, and RabbitMQ. I'll be blogging in depth about the technical side in an upcoming post.

Genesis


It was the summer of 2010 and we were brainstorming ideas for a coding contest called Django Dash. The one we settled on was a listing and comparison site for Django called Django Packages. The result has been a very useful tool for the Django community. Eventually, with the help of several dozen people, we turned the code into the Open Comparison framework and launched Pyramid and Plone implementations. Time permitting this year, we plan to do Python, Flask, Twisted, Node, JQuery, and other implementations.

Since then we've wanted to do something similar, but in the context of products. And we wanted to do it right - elegant design combined with an ad-free space. So we cooked up Consumer Notebook, launching today!

We'll be adding features and enhancements in the months to come. We've acquired a community manager, and even have a blog. We would love for you to check out the site, share it with your friends and family, and send us your commentary, suggestions, and advice.

14:00

Overview The ISC Security Dashboard can be found at https://isc.sans ...(more)...

12:54

As the codemotion.es page states, codemotion is:

Codemotion es el evento que reunirá en España a técnicos, desarrolladores y estudiantes de todas las comunidades y lenguajes. Por primera vez se celebrará en España después de 5 años de éxito en Italia.

Which badly translats to:

Codemotion is the event that will gather technicians, developers and students of different communities and languages. For the first time the event will take place in Spain after 5 years in Italy

Python is within the languages that will take part of the events thanks to the python madrid group. The group will go over the different talk proposals in the next meeting and will try to get you the best talks possible. So if you wanna take part you can do the following

Lets try and make python a better know language in Spain!.

We have lots of MathML from the DocBook manuals for PyOpenGL.  The latest versions of the manuals are now using MathJax to provide rendering of the matrices, equations, etceteras (see, for example, the glRotate manual page).  We've previously had to tell people to update/replace their browser or install a MathML-specific plugin.  This solution will (hopefully) eliminate that need.  A single script is added to each Manual Section which includes any MathML-prefixed element.  The script is served from MathJax' CDN and "just works" (at least, that's the theory, my limited tests seem to bear it out, though).

Using Python to get rid of .doc

I'll be appearing att Software Passion to speak about using Python for protocol specifications, instead of using an external document to write the specification, and then try to implement it from there (or, perhaps more common, implementing it and then trying to keep the document up-to-date).

A while ago at Visual Units, the situation was this: There was a protocol to transfer data over TCP from fleet management black boxes running J2ME to a server running Python, which then stored that data so interesting things could be done with it. Accompanying the protocol was a ever-slightly-out-of-date protocol specification, and a client implementation in Python used for testing the server.

This means that we had four different implementations of the protocol: one in Java, two in Python, and one in English. If one of those was not updated when the others were, the system was no longer consistent, and might break in interesting ways.

Since this created a lot of work for me, I set out to change things. First, I searched for viable existing solutions, but the need to keep the protocol compact (telematics data transfer is expensive), and J2ME support meant I did not find anything to use off the shelf.

Instead I started to implement my own solution, with a vision that I would implement the protocol once, and use it everywhere - Java, Python, and English. In the end, using a couple of hundred of rows of Python, we can now specify a protocol thus:

message = string
timestamp = i64
timediff = i32

ping = ("A ping, with a time and message",
         timestamp, message)

pong = ("A pong, with message, timestamp and perceived lag",
        timestamp, timediff, message)

...and from this, we create Java source code for the terminals, the Python clients and servers use it directly when packing and parsing messages, and the documentation for the poor souls who might want to read English instead of Python is generated.

Want to know how this was made possible, see some code, and point and laugh at my miserable attempts that failed? Want to know why meta-classes were absolutely vital - or not? Register for Software Passion where I'll be talking about this - if you use the promontion code 'BLAAG' when registering, you'll even get a 10% discount!

11:09

ZeroMQ (or ØMQ or ZMQ) is an intelligent messaging framework and described as “sockets on steroids”. That is, they look like normal TCP sockets but actually work as you’d expect sockets to work. PyZMQ adds even more convenience to them, which makes it a really a good choice if you want to implement a distributed application. Another big plus for ØMQ is that you can integrate sub-systems written in C, Java or any other language ØMQ supports (which are a lot).

If you’ve never heard of ØMQ before, I recommend to read ZeroMQ an Introduction by Nicholas Piël, before you go on with this article.

The ØMQ Guide and PyZMQ’s documentation are really good, so you can easily get started. However, when we began to implement a larger application with it (a distributed simulation framework), several questions arose which were not covered by the documentation:

  • What’s the best way do design our application?
  • How can we keep it readable, flexible and maintainable?
  • How do we test it?

I didn’t find something like a best practice article that answered my questions. So in this series of articles, I’m going to talk about what I’ve learned during the last months. I’m not a PyZMQ expert (yet ;-)), but what I’ve done so far works quite well and I never had more tests in a project than I do have now.

You’ll find the source for the examples at bitbucket. They are written in Python 3.2 and tested under Mac OS X Lion, Ubuntu 11.10 and Windows 7, 64 bit in each case. If you have any suggestions or improvements, please fork me or just leave a comment.

In this first article, I’m going to talk a bit about how you could generally design your application to be flexible, maintainable and testable. The second part will be about unit testing and the finally, I’ll cover process and system testing.

Comparison of Different Approaches

There are basically three possible ways to implement a PyZMQ application. One, that’s easy, but limited in practical use, one that’s more flexible, but not really pythonic and one, that needs a bit more setup, but is flexible and pythonic.

All three examples feature a simple ping process and a pong process with varying complexity. I use multiprocessing to run the pong process, because that’s what you should usually do in real PyZMQ applications (you don’t want to use threads and if both processes are running on the same machine, there’s no need to invoke both of them separately).

All of the examples will have the following output:

(zmq)$ python blocking_recv.py
Pong got request: ping 0
Ping got reply: pong 0
...
Pong got request: ping 4
Ping got reply: pong 4

Let’s start with the easy one first. You just use on of the socket’s recv methods in a loop:

# blocking_recv.py
import multiprocessing
import zmq


addr = 'tcp://127.0.0.1:5678'


def ping():
    """Sends ping requests and waits for replies."""
    context = zmq.Context()
    sock = context.socket(zmq.REQ)
    sock.bind(addr)

    for i in range(5):
        sock.send_unicode('ping %s' % i)
        rep = sock.recv_unicode()  # This blocks until we get something
        print('Ping got reply:', rep)


def pong():
    """Waits for ping requests and replies with a pong."""
    context = zmq.Context()
    sock = context.socket(zmq.REP)
    sock.connect(addr)

    for i in range(5):
        req = sock.recv_unicode()  # This also blocks
        print('Pong got request:', req)
        sock.send_unicode('pong %s' % i)


if __name__ == '__main__':
    pong_proc = multiprocessing.Process(target=pong)
    pong_proc.start()

    ping()

    pong_proc.join()

So this is very easy and no that much code. The problem with this is, that it only works well if your process only uses one socket. Unfortunately, in larger applications that is rather rarely the case.

A way to handle multiple sockets per process is polling. In addition to your context and socket(s), you need a poller. You also have to tell it which events on which socket you are going to poll:

# polling.py
def pong():
    """Waits for ping requests and replies with a pong."""
    context = zmq.Context()
    sock = context.socket(zmq.REP)
    sock.bind(addr)

    # Create a poller and register the events we want to poll
    poller = zmq.Poller()
    poller.register(sock, zmq.POLLIN|zmq.POLLOUT)

    for i in range(10):
        # Get all sockets that can do something
        socks = dict(poller.poll())

        # Check if we can receive something
        if sock in socks and socks[sock] == zmq.POLLIN:
            req = sock.recv_unicode()
            print('Pong got request:', req)

        # Check if we cann send something
        if sock in socks and socks[sock] == zmq.POLLOUT:
            sock.send_unicode('pong %s' % (i // 2))

    poller.unregister(sock)

You see, that our pong function got pretty ugly. You need 10 iterations to do five ping-pongs, because in each iteration you can either send or reply. And each socket you add to your process adds two more if-statements. You could improve that design if you created a base class wrapping the polling loop and just register sockets and callbacks in an inheriting class.

That brings us to our final example. PyZMQ comes with with an adapted Tornado eventloop that handles the polling and works with ZMQStreams, that wrap sockets and add some functionality:

# eventloop.py
from zmq.eventloop import ioloop, zmqstream


class Pong(multiprocessing.Process):
    """Waits for ping requests and replies with a pong."""
    def __init__(self):
        super().__init__()
        self.loop = None
        self.stream = None
        self.i = 0

    def run(self):
        """
        Initializes the event loop, creates the sockets/streams and
        starts the (blocking) loop.

        """
        context = zmq.Context()
        self.loop = ioloop.IOLoop.instance()  # This is the event loop

        sock = context.socket(zmq.REP)
        sock.bind(addr)
        # We need to create a stream from our socket and
        # register a callback for recv events.
        self.stream = zmqstream.ZMQStream(sock, self.loop)
        self.stream.on_recv(self.handle_ping)

        # Start the loop. It runs until we stop it.
        self.loop.start()

    def handle_ping(self, msg):
        """Handles ping requests and sends back a pong."""
        # req is a list of byte objects
        req = msg[0].decode()
        print('Pong got request:', req)
        self.stream.send_unicode('pong %s' % self.i)

        # We’ll stop the loop after 5 pings
        self.i += 1
        if self.i == 5:
            self.stream.flush()
            self.loop.stop()

This even adds more boilerplate code, but it will pay of if you use more sockets and most of that stuff in run() can be put into a base class. Another drawback is, that the IOLoop only uses recv_multipart(). So you always get a lists of byte strings which you have to decode or deserialize on your own. However, you can use all the send methods socket offers (like send_unicode() or send_json()). You can also stop the loop from within a message handler.

In the next sections, I’ll discuss how you could implement a PyZMQ process that uses the event loop.

Communication Design

Before you start to implement anything, you should think about what kind of processes you need in your application and which messages they exchange. You should also decide what kind of message format and serialization you want to use.

PyZMQ has built-in support for Unicode (send sends plain C strings which map to Python byte objects, so there’s a separate method to send Unicode strings), JSON and Pickle.

JSON is nice, because it’s fast and lets you integrate processes written in other languages into you application. It’s also a bit safer, because you cannot receive arbitrary objects as with pickle. The most straightforward syntax for JSON messages is to let them be triples [msg_type, args, kwargs], where msg_type maps to a method name and args and kwargs get passed as positional and keyword arguments.

I strongly recommend you to document each chain of messages your application sends to perform a certain task. I do this with fancy PowerPoint graphics and with even fancier ASCII art in Sphinx. Here is how I would document our ping-pong:

Sending pings
-------------

* If the ping process sends a *ping*, the pong processes responds with a
  *pong*.
* The number of pings (and pongs) is counted. The current ping count is
  sent with each message.

::

    PingProc      PongProc
     [REQ] ---1--> [REP]
           <--2---


    1 IN : ['ping, count']
    1 OUT: ['ping, count']

    2 IN : ['pong, count']
    2 OUT: ['pong, count']

First, I write some bullet points that explain how the processes behave and why they behave this way. This is followed by some kind of sequence diagram that shows when which process sents which message using which socket type. Finally, I write down how the messages are looking. # IN is what you would pass to send_multipart and # OUT is, what is received on the other side by recv_multipart. If one of the participating sockets is a ROUTER or DEALER, IN and OUT will differ (though that’s not the case in this example). Everything in single quotation marks (') represents a JSON serialized list.

If our pong process used a ROUTER socket instead of the REP socket, it would look like this:

1 IN : ['ping, count']
1 OUT: [ping_uuid, '', 'ping, count']

2 IN : [ping_uuid, '', 'pong, count']
2 OUT: ['pong, count']

This seems like a lot of tedious work, but trust me, it really helps a lot when you need to change something a few weeks later!

Application Design

In the examples above, the Pong process was responsible for setting everything up, for receiving/sending messages and for the actual application logic (counting incoming pings and creating a pong).

Obviously, this is not a very good design. What we can do about this is to put most of that nasty setup stuff into a base class which all your processes can inherit from, and to put all the actual application logic into a separate (PyZMQ independent) class.

ZmqPocess – The Base Class for all Processes

The base class basically implements two things:

  • a setup method that creates a context an a loop
  • a stream factory method for streams with a on_recv callback. It creates a socket and can connect/bind it to a given address or bind it to a random port (that’s why it returns the port number in addition to the stream itself).

It also inherits multiprocessing.Process so that it is easier to spawn it as sub-process. Of course, you can also just call its run() method from you main().

# zmqproc.py
import multiprocessing

from zmq.eventloop import ioloop, zmqstream
import zmq


class ZmqProcess(multiprocessing.Process):
    """
    This is the base for all processes and offers utility functions
    for setup and creating new streams.

    """
    def __init__(self):
        super().__init__()

        self.context = None
        """The ØMQ :class:`~zmq.Context` instance."""

        self.loop = None
        """PyZMQ's event loop (:class:`~zmq.eventloop.ioloop.IOLoop`)."""

    def setup(self):
        """
        Creates a :attr:`context` and an event :attr:`loop` for the process.

        """
        self.context = zmq.Context()
        self.loop = ioloop.IOLoop.instance()

    def stream(self, sock_type, addr, bind, callback=None, subscribe=b''):
        """
        Creates a :class:`~zmq.eventloop.zmqstream.ZMQStream`.

        :param sock_type: The ØMQ socket type (e.g. ``zmq.REQ``)
        :param addr: Address to bind or connect to formatted as *host:port*,
                *(host, port)* or *host* (bind to random port).
                If *bind* is ``True``, *host* may be:

                - the wild-card ``*``, meaning all available interfaces,
                - the primary IPv4 address assigned to the interface, in its
                  numeric representation or
                - the interface name as defined by the operating system.

                If *bind* is ``False``, *host* may be:

                - the DNS name of the peer or
                - the IPv4 address of the peer, in its numeric representation.

                If *addr* is just a host name without a port and *bind* is
                ``True``, the socket will be bound to a random port.
        :param bind: Binds to *addr* if ``True`` or tries to connect to it
                otherwise.
        :param callback: A callback for
                :meth:`~zmq.eventloop.zmqstream.ZMQStream.on_recv`, optional
        :param subscribe: Subscription pattern for *SUB* sockets, optional,
                defaults to ``b''``.
        :returns: A tuple containg the stream and the port number.

        """
        sock = self.context.socket(sock_type)

        # addr may be 'host:port' or ('host', port)
        if isinstance(addr, str):
            addr = addr.split(':')
        host, port = addr if len(addr) == 2 else (addr[0], None)

        # Bind/connect the socket
        if bind:
            if port:
                sock.bind('tcp://%s:%s' % (host, port))
            else:
                port = sock.bind_to_random_port('tcp://%s' % host)
        else:
            sock.connect('tcp://%s:%s' % (host, port))

        # Add a default subscription for SUB sockets
        if sock_type == zmq.SUB:
            sock.setsockopt(zmq.SUBSCRIBE, subscribe)

        # Create the stream and add the callback
        stream = zmqstream.ZMQStream(sock, self.loop)
        if callback:
            stream.on_recv(callback)

        return stream, int(port)

PongProc – The Actual Process

The PongProc inherits ZmqProcess and is the main class for our process. It creates the streams, starts the event loop and dispatches all messages to the appropriate handlers:

# pongproc.py
from zmq.utils import jsonapi as json
import zmq

import zmqproc


host = '127.0.0.1'
port = 5678


class PongProc(zmqproc.ZmqProcess):
    """
    Main processes for the Ponger. It handles ping requests and sends back
    a pong.

    """
    def __init__(self, bind_addr):
        super().__init__()

        self.bind_addr = bind_addr
        self.rep_stream = None

        # Make sure this is pickle-able (e.g., not using threads)
        # or it won't work on Windows. If it's not pickle-able, instantiate
        # it in setup().
        self.ping_handler = PingHandler()

    def setup(self):
        """Sets up PyZMQ and creates all streams."""
        super().setup()

        self.rep_stream, _ = self.stream(zmq.REP, self.bind_addr, bind=True,
                callback=self.handle_rep_stream)

    def run(self):
        """Sets up everything and starts the event loop."""
        self.setup()
        self.loop.start()

    def stop(self):
        """Stops the event loop."""
        self.loop.stop()

    def handle_rep_stream(self, msg):
        """
        Handles messages from a Pinger:

        *ping*
            Send back a pong.

        *plzdiekthxbye*
            Stop the ioloop and exit.

        """
        msg_type, data = json.loads(msg[0])

        if msg_type == 'ping':
            rep = self.ping_handler.make_pong(data)
            self.rep_stream.send_json(rep)

        elif msg_type == 'plzdiekthxbye':
            self.stop()

        else:
            raise RuntimeError('Received unkown message type: %s' % msg_type)

There are a couple of things to note here:

  • I instantiated the PingHandler in the process’ __init__ method. If you are going to start this process as a sub-process via start, make sure everything you instantiate in __init__ is pickle-able or it won’t work on Windows (Linux and Mac OS X use fork to create a sub-process and fork just makes a copy of the main process and gives it a new process ID. On Windows, there is no fork and the context of your main process is pickled and sent to the sub-process).

  • In setup, call super().setup() before you create a stream or you won’t have a loop instance for them. You don’t call setup in the process’ __init__, because the context must be created within the new system process. So we call setup in run.

  • The stop method is not really necessary in this example, but it can be used to send stop messages to sub-processes when the main process terminates and to do other kinds of clean-up. You can also execute it if you except a KeyboardInterrupt after calling run.

  • handle_rep_stream is the message dispatcher for the process’ REP stream. It parses the message and calls the appropriate handler for that message (or raises an error if the message type is invalid). If your if and elif statements all do the same, you might consider replacing them with a dict that contains the handlers for each message type:

    handlers = {
        'msg': self.handler_for_msg,
    }
    try:
        rep = handlers[msg_type](data)
        self.rep_stream.send_multipart(rep)
    except KeyError:
        raise RuntimeError('Received unknown message.')
    

PingHandler – The Application Logic

The PingHandler contains the actual application logic (which is not much, in this example). The make_pong method just gets the number of pings sent with the ping message and creates a new pong message. The serialization is done by PongProc, so our Handler does not depend on PyZMQ:

class PingHandler(object):

    def make_pong(self, num_pings):
        """Creates and returns a pong message."""
        print('Pong got request number %s' % num_pings)

        return ['pong', num_pings]

Summary

Okay, that’s it for now. I showed you three ways to use PyZMQ. If you have a very simple process with only one socket, you can easily use its blocking recv methods. If you need more than one socket, I recommend using the event loop. And polling … you don’t want to use that.

If you decide to use PyZMQ’s event loop, you should separate the application logic from all the PyZMQ stuff (like creating streams, sending/receiving messages and dispatching them). If your application consists of more then one process (which is usually the case), you should also create a base class with shared functionality for them.

In the next part, I’m going to talk about how you can test your application.

This is the second part of the series Designing and Testing PyZMQ Applications. In the first part, I wrote about designing a PyZMQ application, so this time it’s all about (unit) testing (remember, if it’s not tested, it’s broken). I also updated the repository for this article with the new code examples.

My favorite testing tools are pytest by Holger Krekel and Mock by Michael Ford. Pytest is particularly awesome because of its re-evaluation of assert statements. If your test contains an assert spam == 'eggs' and the assert fails, pytest re-evaluates it and prints the value of spam. Really helpful and you don’t need any boilerplate code for that. Mock is really nice for mocking external dependencies and asserting that your code called them in the correct way.

If you cloned the repository for this article, just run py.test from its root directory:

$ pip install pytest mock
...
Successfully installed pytest mock
Cleaning up...
$ py.test
=================== test session starts ====================
platform darwin -- Python 3.2.2 -- pytest-2.2.3
collected 11 items

test/test_pongproc.py .......
test/test_zmqproc.py ....

================ 11 passed in 0.12 seconds =================

Unit Testing

The probability that PyZMQ works correctly is very high. The probability that your code will call a PyZMQ function in such a way that it blocks forever and halts your test runner is also very high. Therefore, it’s a good idea to mock everything PyZMQ-related for your unit tests. And since your application logic might also not be implemented when you start testing your process, you should mock that, too.

What you’ll actually end up testing is the following:

  • Does your message handler call your application logic in the right way given a certain input message?
  • Does your message handler create and send the correct reply based on the return value of your application logic?

ZmqProcess

Let’s start with ZmqProcess again. After all, everything else depends on it. Testing its setup method is easy. We just check that it creates a context and a loop:

# test/test_zmqproc.py
from zmq.eventloop import ioloop
import mock
import pytest
import zmq

import zmqproc


class TestZmqProcess(object):
    """Tests for :class:`zmqproc.ZmqProcess`."""

    def test_setup(self):
        zp = zmqproc.ZmqProcess()
        zp.setup()

        assert isinstance(zp.context, zmq.Context)
        assert isinstance(zp.loop, ioloop.IOLoop)

Testing stream is more complicated. We need to test if it can handle various address formats, if it creates or binds correctly and if it performs a default subscription for SUB sockets.

Pytest 2.2 introduced a parametrize decorator, that helps calling a test multiple times with varying inputs. You just define one or more arguments for your test function and a list of values for these arguments. For test_stream, I only need a kwargs parameter containing the parameters for the stream call:

# test/test_zmqproc.py

    @pytest.mark.parametrize('kwargs', [
        dict(sock_type=23, addr='127.0.0.1:1234', bind=True,
              callback=mock.Mock()),
        dict(sock_type=23, addr='127.0.0.1', bind=True,
              callback=mock.Mock()),
        dict(sock_type=zmq.SUB, addr=('localhost', 1234), bind=False,
              callback=mock.Mock(), subscribe=b'ohai'),
    ])
    def test_stream(self, kwargs):

The next step is to create an instance of ZmqProcess and patch some of its attributes. We also need to set a defined return value for the socket’s bind_to_random_port method:

# test/test_zmqproc.py

        zp = zmqproc.ZmqProcess()

        # Patch the ZmqProcess instance
        zp.context = mock.Mock(spec_set=zmq.Context)
        zp.loop = mock.Mock(spec_set=ioloop.IOLoop)
        sock_mock = zp.context.socket.return_value
        sock_mock.bind_to_random_port.return_value = 42

For the actual test, we also need to patch ZMQStream. Although mock.patch could work as a function decorator, we need to use it as context processor if we also uses pytest funcargs (e.g., via the parametrize decorator—I don’t know if it’s even possible to uses both, mock.patch as decorator and pytest funcargs in one test).

# test/test_zmqproc.py

        # Patch ZMQStream and start testing
        with mock.patch('zmq.eventloop.zmqstream.ZMQStream') as zmqstream_mock:
            stream, port = zp.stream(**kwargs)

Finally, we can check the return values of our stream method and it made the correct calls to create the stream:

# test/test_zmqproc.py

            # Assert that the return values are correct
            assert stream is zmqstream_mock.return_value
            if isinstance(kwargs['addr'], tuple):
                assert port == kwargs['addr'][1]
            elif ':' in kwargs['addr']:
                assert port == int(kwargs['addr'][-4:])
            else:
                assert port == sock_mock.bind_to_random_port.return_value

            # Check that the socket was crated correctly
            assert zp.context.socket.call_args == ((kwargs['sock_type'],), {})
            if kwargs['bind'] and ':' in kwargs['addr']:
                assert sock_mock.bind.call_args == (
                        ('tcp://%s' % kwargs['addr'],), {})
            elif kwargs['bind']:
                assert sock_mock.bind_to_random_port.call_args == (
                        ('tcp://%s' % kwargs['addr'],), {})
            else:
                assert sock_mock.connect.call_args == (
                        ('tcp://%s:%s' % kwargs['addr'],), {})

            # Check creation of the stream
            assert zmqstream_mock.call_args == ((sock_mock, zp.loop), {})
            assert zmqstream_mock.return_value.on_recv.call_args == (
                    (kwargs['callback'],), {})

            # Check default subscribtion
            if 'subscribe' in kwargs:
                assert sock_mock.setsockopt.call_args == (
                        (zmq.SUBSCRIBE, kwargs['subscribe']), {})

Note: You may have noticed that I use assert my_mock.call_args == ... rather than my_mock.assert_called_with(...). The reason for that is simply, that assert statements are highlighted but ordinary function calls are not. This makes it easier for me to find all assertions in a test.

PongProc

Testing the PongProc is not much different from testing its base class. pytest_funcarg__pp will instantiate a PongProc instance for each test that has a pp argument. The tests for setup, run and stop are easy to do. We create a few mocks and then ask them if the tested function called them correctly:

# test/test_pongproc.py
from zmq.utils import jsonapi as json
import mock, pytest, zmq

import pongproc

host, port = '127.0.0.1', 5678

def pytest_funcarg__pp(request):
    """Creates a PongProc instance."""
    return pongproc.PongProc((host, port))


class TestPongProc(object):
    """Tests :class:`pongproc.PongProc`."""

    def test_setup(self, pp):
        pp.stream = mock.Mock(side_effect=lambda *a, **k: (a[0], mock.Mock()))

        with mock.patch('zmqproc.ZmqProcess.setup') as setup_mock:
            pp.setup()
            assert setup_mock.call_count == 1

        # Assert that all streams were created
        assert pp.stream.call_args_list == [
            ((zmq.REP, (host, port)),
                dict(bind=True, callback=pp.handle_rep_stream)),
        ]
        assert pp.rep_stream == zmq.REP

    def test_run(self, pp):
        pp.setup = mock.Mock()
        pp.loop = mock.Mock()

        pp.run()

        assert pp.setup.call_count == 1
        assert pp.loop.start.call_count == 1

    def test_stop(self, pp):
        pp.loop = mock.Mock()
        pp.stop()
        assert pp.loop.stop.call_count == 1

The callbacks for streams (e.g., PongProc.handle_rep_stream in our case) can get a bit more complicated, so I’ve split the test up in one test per message type plus one extra test that checks if invalid messages are handled correctly. If all your callbacks behave the same in that case (e.g., they all raise an error or just print something), you can handle them with one test case and the parametrize decorator:

# test/test_pongproc.py

    @pytest.mark.parametrize(('handler', 'msg'), [
        ('handle_rep_stream', ['["spam", []]']),
        # You can add more handlers here
    ])
    def test_handle_bad_msg(self, pp, handler, msg):
        pytest.raises(RuntimeError, getattr(pp, handler), msg)

Testing if stop and ping messages are handled correctly is now straightforward. We perform some mocking (for the application logic and the stream that sends the reply), pass our message to the handler and then just check if it did the right things right:

# test/test_pongproc.py

    def test_stop_msg(self, pp):
        pp.stop = mock.Mock()
        pp.handle_rep_stream([b'["plzdiekthxbye", null]'])
        assert pp.stop.call_count == 1

    def test_ping(self, pp):
        msg = ['ping', 1]  # Input message
        retval = 'spam'  # Return value for PingHandler.make_pong
        pp.ping_handler = mock.Mock(spec_set=pongproc.PingHandler)
        pp.ping_handler.make_pong.return_value = retval
        pp.rep_stream = mock.Mock()

        pp.handle_rep_stream([json.dumps(msg)])

        assert pp.ping_handler.make_pong.call_args == ((msg[1],), {})
        assert pp.rep_stream.send_json.call_args == ((retval,), {})

PingHandler

When we are done with all that network stuff, we can finally test the application logic. Easy-peasy in our case:

# test/test_pongproc.py

def pytest_funcarg__ph(request):
    """Creates a PingHandler instance."""
    return pongproc.PingHandler()

class TestPingHandler(object):
    def test_make_pong(self, ph):
        ping_num = 23
        ret = ph.make_pong(ping_num)
        assert ret == ['pong', ping_num]

Summary

Thanks to the Mock library, unit testing PyZMQ apps is really not that hard and not much different from normal unit testing. However, what we know now is only, that our process should work in theory. We haven’t yet started it and sent real messages to it.

The next and final part of this series will show you how you can automate testing complete processes. Until then, you should get your test coverage up to 100% to protect yourself from nasty surprises when you start with process testing.

With GObject introspection is very simple to set the settings of your system trough python. Fist, lets use the command line to find out our current settings:

gsettings list-recursively org.gnome.system.proxy

The following script allows you to retrieve the http proxy settings that you are currently using:

from gi.repository import Gio
 
def get_settings():
    """Get proxy settings."""
    http_settings = Gio.Settings.new('org.gnome.system.proxy.http')
    host = http_settings.get_string('host')
    port = http_settings.get_int('port')
    if http_settings.get_boolean('use-authentication'):
        username = http_settings.get_string('authentication_user')
        password = http_settings.get_string('authentication_password')
    else:
        username = password = None
    return host, port, username, password

Setting them is as easy as getting them:

from gi.repository import Gio
 
def set_settings(host, port, username=None, password=None):
     """Set proxy settings."""
     http_settings = Gio.Settings.new('org.gnome.system.proxy.http')
     http_settings.set_string('host', host)
     http_settings.set_int('port', port)
     if username is not None:
         http_settings.set_boolean('use-authentication', True)
         http_settings.set_string('authentication_user', username)
         http_settings.set_string('authentication_password', password)

This is not utterly complicated but I’m notice that there are not many examples out there, so there you go. There is no code there that can be considered hard but I’d like to point out that if you use the get_value method from the Settings object you will have to call the appropriate get_* method from the returned GVariant, that is:

host = http_settings.get_string('host')

is equal to the following:

host = http_settings.get_value('host').get_string()

As part of writing my PhD I am currently thinking about the relationship between PyPy's meta-tracing approach with various previous ideas to automatically get a (JIT-)compiler from only an interpreter of a language. One of the most-researched ideas along these lines is that of partial evaluation. Partial evaluation has basically the same goals as PyPy when it comes to compilers: Write an interpreter, and get a compiler for free. The methods for reaching that goal are a bit different. In this series of blog posts, I am trying to explore the similarities and differences of partial evaluation and PyPy's meta-tracing.

A Flowgraph Language

To be able to clearly understand what "partial evaluation" is and what "meta-tracing" is I will show an "executable model" of both. To that end, I am defining a small imperative language and will then show what a partial evaluator and a tracer for that language look like. All this code will be implemented in Prolog. (Any pattern-matching functional language would do, but I happen to know Prolog best. Backtracking is not used, so you can read things simply as functional programs.) In this post I will start with the definition of the language, and a partial evaluator for it. The code written in this blog post can be found fully here: http://paste.pocoo.org/show/541004/

The language is conceptionally similar to PyPy's flow graphs, but a bit more restricted. It does not have function calls, only labelled basic blocks that consist of a series of linearly executed operations, followed by a conditional or an unconditional jump. Every operation is assigning a value to a variable, which is computed by applying some operation to some arguments.

A simple program to raise x to the yth power in that language looks like this:

power:
    res = 1
    if y goto power_rec else goto power_done

power_rec:
    res = res * x
    y = y - 1
    if y goto power_rec else goto power_done

power_done:
    print_and_stop(res)

To represent the same program as Prolog data structures, we use the following Prolog code:

block(power, op1(res, same, const(1),
             if(y, power_rec, power_done))).
block(power_rec, op2(res, mul, var(res), var(x),
                 op2(y, sub, var(y), const(1),
                 if(y, power_rec, power_done)))).
block(power_done, print_and_stop(var(res))).

Every rule of block declares one block by first giving the label of the block, followed by the code. Code is a series of op1 or op2 statements terminated by a jump, an if or a print_and_stop. op1 statements are operations with one argument of the form op1(res_variable, operation_name, argument, next_statement). Arguments can be either variables in the form var(name) or constants in the form const(value).

To run programs in this flowgraph language, we first need some helper functionality. The first few helper functions are concerned with the handling of environments, the data structures the interpreter uses to map variable names occuring in the program to the variables' current values. In Python dictionaries would be used for this purpose, but in Prolog we have to emulate these by lists of key/value pairs (not very efficient, but good enough):

lookup(X, [], _) :- throw(key_not_found(X)).
lookup(Key, [Key/Value | _], Value) :- !.
lookup(Key, [_ | Rest], Value) :- lookup(Key, Rest, Value).

write_env([], X, V, [X/V]).
write_env([Key/_ | Rest], Key, Value, [Key/Value | Rest]) :- !.
write_env([Pair | Rest], Key, Value, [Pair | NewRest]) :- write_env(Rest, Key, Value, NewRest).

remove_env([], _, []).
remove_env([Key/_ | Rest], Key, Rest) :- !.
remove_env([Pair | Rest], Key, [Pair | NewRest]) :- remove_env(Rest, Key, NewRest).

resolve(const(X), _, X).
resolve(var(X), Env, Y) :- lookup(X, Env, Y).

The implementation of these functions is not too important. The lookup function finds a key in an environment list, the write_env function adds a new key/value pair to an environment, remove_env removes a key. The resolve function is used to take either a constant or a variable and return a value. If it's a constant, the value of that constant is returned, if it's a variable it is looked up in the environment. Note how the last argument of lookup and resolve is actually a return value, which is the typical approach in Prolog.

So far we have not specified what the primitive operations that can occur in the program actually mean. For that we define a do_op function which executes primitive operations:

do_op(same, X, X).
do_op(mul, X, Y, Z) :- Z is X * Y.
do_op(add, X, Y, Z) :- Z is X + Y.
do_op(sub, X, Y, Z) :- Z is X - Y.
do_op(eq, X, Y, Z) :- X == Y -> Z = 1; Z = 0.
do_op(ge, X, Y, Z) :- X >= Y -> Z = 1; Z = 0.
do_op(readlist, L, I, X) :- nth0(I, L, X).
do_op(Op, _, _, _) :- throw(missing_op(Op)).

Again the last argument is an output variable.

Now we can start executing simple operations. For that an interp predicate is defined. It takes as its first argument the current environment and as the second argument the operation to execute. E.g. to execute primitive operations with one or two arguments:

interp(op1(ResultVar, Op, Arg, Rest), Env) :-
    resolve(Arg, Env, RArg),
    do_op(Op, RArg, Res),
    write_env(Env, ResultVar, Res, NEnv),
    interp(Rest, NEnv).

interp(op2(ResultVar, Op, Arg1, Arg2, Rest), Env) :-
    resolve(Arg1, Env, RArg1),
    resolve(Arg2, Env, RArg2),
    do_op(Op, RArg1, RArg2, Res),
    write_env(Env, ResultVar, Res, NEnv),
    interp(Rest, NEnv).

First the arguments are resolved into values. Afterwards the operation is executed, and the result is written back into the environment. Then interp is called on the rest of the program. Similarly easy are the unconditional jump and print_and_stop:

interp(jump(L), Env) :-
    block(L, Block),
    interp(Block, Env).


interp(print_and_stop(Arg), Env) :-
    resolve(Arg, Env, Val),
    print(Val), nl.

In the unconditional jump we simply get the target block and continue executing that. To execute print_and_stop we resolve the argument, print the value and then are done.

The conditional jump is only slightly more difficult:

interp(if(V, L1, L2), Env) :-
    lookup(V, Env, Val),
    (Val == 0 ->
        block(L2, Block)
    ;
        block(L1, Block)
    ),
    interp(Block, Env).

First the variable is looked up in the environment. If the variable is zero, execution continues at the second block, otherwise it continues at the first block.

Given this interpreter, we can execute the above example program like this, on a Prolog console:

$ swipl -s cfglang.pl
?- block(power, Block), interp(Block, [x/10, y/10]).
10000000000

Partial Evaluation of the Flowgraph Language

Let's look at what a partial evaluator for this simple flowgraph language would look like. Partial evaluation (PE), also called specialization, is a program manipuation technique. PE takes an input program and transforms it into a (hopefully) simpler and faster output program. It does this by assuming that some variables in the input program are constants. All operations that act only on such constants can be folded away. All other operations need to remain in the output program (called residual program). Thus the partial evaluator proceeds much like an interpreter, just that it cannot actually execute some operations. Also, its output is not just a value, but also list of remaining operations that could not be optimized away.

The partial evaluator cannot use normal environments, because unlike the interpreter not all variables' values are known to it. It will therefore work on partial environments, which store just the know variables. For these partial environments, some new helper functions are needed:

plookup(Key, [], var(Key)).
plookup(Key, [Key/Value | _], const(Value)) :- !.
plookup(Key, [_ | Rest], Value) :- plookup(Key, Rest, Value).

presolve(const(X), _, const(X)).
presolve(var(V), PEnv, X) :- plookup(V, PEnv, X).

The function plookup takes a variable and a partial environment and returns either const(Value) if the variable is found in the partial environment or var(Key) if it is not. Equivalently, presolve is like resolve, except that it uses plookup instead of lookup.

With these helpers we can start writing a partial evaluator. The following two rules are where the main optimization in the form of constant folding happens. The idea is that when the partial evaluator sees an operation that involves only constant arguments, it can constant-fold the operation, otherwise it can't:

pe(op1(ResultVar, Op, Arg, Rest), PEnv, NewOp) :-
    presolve(Arg, PEnv, RArg),
    (RArg = const(C) ->
        do_op(Op, C, Res),
        write_env(PEnv, ResultVar, Res, NEnv),
        RestResidual = NewOp
    ;
        remove_env(PEnv, ResultVar, NEnv),
        NewOp = op1(ResultVar, Op, RArg, RestResidual)
    ),
    pe(Rest, NEnv, RestResidual).

pe(op2(ResultVar, Op, Arg1, Arg2, Rest), PEnv, NewOp) :-
    presolve(Arg1, PEnv, RArg1),
    presolve(Arg2, PEnv, RArg2),
    (RArg1 = const(C1), RArg2 = const(C2) ->
        do_op(Op, C1, C2, Res),
        write_env(PEnv, ResultVar, Res, NEnv),
        RestResidual = NewOp

    ;
        remove_env(PEnv, ResultVar, NEnv),
        NewOp = op2(ResultVar, Op, RArg1, RArg2, RestResidual)
    ),
    pe(Rest, NEnv, RestResidual).

The pe predicate takes a partial environment, the current operations and potentially returns a new operation. To partially evaluate a simple operation, its arguments are looked up in the partial environment. If all the arguments are constants, the operation can be executed, and no new operation is produced. Otherwise, we need to produce a new residual operation which is exactly like the one currently looked at. Also, the result variable needs to be removed from the partial environment, because it was just overwritten by an unknown value.

The potentially generated residual operation is stored into the output argument NewOp. The output argument of the recursive call is the last argument of the newly created residual operation, which will then be filled by the recursive call. This is a typical approach in Prolog, but may look strange if you are not familiar with it.

Note how the first case of these two rules is just like interpretation. The second case doesn't really do anything, it just produces a residual operation. This relationship between normal evaluation and partial evaluation is very typical.

The unconditional jump and print_and_stop are not much more complex:

pe(jump(L), PEnv, jump(LR)) :-
    do_pe(L, PEnv, LR).

pe(print_and_stop(Arg), Env, print_and_stop(RArg)) :-
    presolve(Arg, Env, RArg).

To partially evaluate an unconditional jump we again produce a jump. The target label of that residual jump is computed by asking the partial evaluator to produce residual code for the label L with the given partial environment. print_and_stop is simply turned into a print_and_stop. We will see the code for do_pe soon.

Conditional jumps are more interesting:

pe(if(V, L1, L2), PEnv, NewOp) :-
    plookup(V, PEnv, Val),
    (Val = const(C) ->
        (C = 0 ->
            L = L2
        ;
            L = L1
        ),
        do_pe(L, PEnv, LR),
        NewOp = jump(LR)
    ;
        do_pe(L1, PEnv, L1R),
        do_pe(L2, PEnv, L2R),
        NewOp = if(V, L1R, L2R)
    ).

First we look up the value of the condition variable. If it is a constant, we can produce better code, because we know statically that only one path is reachable. Thus we produce code for that path, and then emit an unconditional jump there. If the condition variable is not known at partial evaluation time, we need to partially evaluate both paths and produce a conditional jump in the residual code.

This rule is the one that causes the partial evaluator to potentially do much more work than the interpreter, because after an if sometimes both paths need to be explored. In the worst case this process never stops, so a real partial evaluator would need to ensure somehow that it terminates. There are many algorithms for doing that, but I will ignore this problem here.

Now we need to understand what the do_pe predicate is doing. Its most important task is to make sure that we don't do the same work twice by memoizing code that was already partially evaluated in the past. For that it keeps a mapping of Label, Partial Environment to Label of the residual code:

do_pe(L, PEnv, LR) :-
    (code_cache(L, PEnv, LR) ->
        true
    ;
        gensym(L, LR),
        assert(code_cache(L, PEnv, LR)),
        block(L, Code),
        pe(Code, PEnv, Residual),
        assert(block(LR, Residual))
    ).

If the code cache indicates that label L was already partially evaluated with partial environment PEnv, then the previous residual code label LPrevious is returned. Otherwise, a new label is generated with gensym, the code cache is informed of that new label with assert, then the block is partially evaluated and the residual code is added to the database.

For those who know partial evaluation terminology: This partial evaluator is a polyvariant online partial evaluator. "Polyvariant" means that for every label, several specialized version of the block can be generated. "Online" means that no preprocessing is done before the partial evaluator runs.

Partial Evaluation Example

With this code we can look at the classical example of partial evaluation (it's probably the "Hello World" of partial evaluation). We can ask the partial evaluator to compute a power function, where the exponent y is a fixed number, e.g. 5, and the base x is unknown:

?- do_pe(power, [y/5], LR).
LR = power1.

To find out which code was produced, we can use listing:

?- listing(code_cache)
code_cache(power, [y/5], power1).
code_cache(power_rec, [y/5, res/1], power_rec1).
code_cache(power_rec, [y/4], power_rec2).
code_cache(power_rec, [y/3], power_rec3).
code_cache(power_rec, [y/2], power_rec4).
code_cache(power_rec, [y/1], power_rec5).
code_cache(power_done, [y/0], power_done1).

?- listing(block)
.... the block definition of the user program ....
block(power_done1, print_and_stop(var(res))).
block(power_rec5, op2(res, mul, var(res), var(x), jump(power_done1))).
block(power_rec4, op2(res, mul, var(res), var(x), jump(power_rec5))).
block(power_rec3, op2(res, mul, var(res), var(x), jump(power_rec4))).
block(power_rec2, op2(res, mul, var(res), var(x), jump(power_rec3))).
block(power_rec1, op2(res, mul, const(1), var(x), jump(power_rec2))).
block(power1, jump(power_rec1)).

The code_cache tells which residual labels correspond to which original labels under which partial environments. Thus, power1 contains the code of power under the assumption that y is 5. Looking at the block listing, the label power1 corresponds to code that simply multiplies res by x five times without using the variable x at all. The loop that was present in the original program has been fully unrolled, the loop variable y has disappeared. Hopefully this is faster than the original program.

Conclusion

In this blog post we saw an interpreter for a simple flow graph language in Prolog, together with a partial evaluator for it. The partial evaluator essentially duplicates every rule of the interpreter. If all the arguments of the current operation are known, it acts like the interpreter, otherwise it simply copies the operation into the residual code.

Partial evaluation can be used for a variety of applications, but the most commonly cited one is that of applying it to an interpreter. To do that, the program that the interpreter runs is assumed to be constant by the partial evaluator. Thus a specialized version of the interpreter is produced that does not use the input program at all. That residual code can be seen as a compiled version of the input program.

In the next blog post in this series we will look at writing a simple tracer for the same flowgraph language.

Part 3 of Comparing Partial Evaluation to Tracing

This is the third blog post in a series about comparing partial evaluation and tracing. In the first post of the series I introduced a small flow-graph language together with an interpreter for it. Then I showed a partial evaluator for the language. In the second post of the series I showed how a tracer for the same language works and how it relates to both execution and to partial evaluation. Then I added support for promotion to that tracer.

In this post I will show how to optimize the traces that are produced by the tracer and compare the structure of the optimizer to that of partial evaluation.

The code from this post can be found here: http://paste.pocoo.org/show/547304/

Optimizing Traces

In the last post we saw how to produce a linear trace with guards by interpreting a control flow graph program in a special mode. A trace always end with a loop statement, which jumps to the beginning. The tracer is just logging the operations that are done while interpreting, so the trace can contain superfluous operations. On the other hand, the trace also contains some of the runtime values through promotions and some decisions made on them which can be exploited by optimization. An example for this is the trace produced by the promotion example from the last post:

op2(c,ge,var(i),const(0),
guard_true(c,[],l_done,
guard_value(x,5,[],b2,
op2(x2,mul,var(x),const(2),
op2(x3,add,var(x2),const(1),
op2(i,sub,var(i),var(x3),
loop))))))

After the guard_value(x, 5, ...) operation, x is know to be 5: If it isn't 5, execution falls back to the interpreter. Therefore, operations on x after the guard can be constant-folded. To do that sort of constant-folding, an extra optimization step is needed. That optimization step walks along the trace, remembers which variables are constants and what their values are using a partial environment. The opimizer removes operations that have only constant arguments and leaves the others in the trace. This process is actually remarkably similar to partial evaluation: Some variables are known to be constants, operations on only constant arguments are optimized away, the rest remains.

The code for optimizing operations looks as follows:

optimize(op1(ResultVar, Op, Arg, Rest), PEnv, NewOp) :-
    presolve(Arg, PEnv, RArg),
    (RArg = const(C) ->
        do_op(Op, C, Res),
        write_env(PEnv, ResultVar, Res, NEnv),
        NewOp = RestResidual
    ;
        remove_env(PEnv, ResultVar, NEnv),
        NewOp = op1(ResultVar, Op, RArg, RestResidual)
    ),
    optimize(Rest, NEnv, RestResidual).

optimize(op2(ResultVar, Op, Arg1, Arg2, Rest), PEnv, NewOp) :-
    presolve(Arg1, PEnv, RArg1),
    presolve(Arg2, PEnv, RArg2),
    (RArg1 = const(C1), RArg2 = const(C2) ->
        do_op(Op, C1, C2, Res),
        write_env(PEnv, ResultVar, Res, NEnv),
        NewOp = RestResidual
    ;
        remove_env(PEnv, ResultVar, NEnv),
        NewOp = op2(ResultVar, Op, RArg1, RArg2, RestResidual)
    ),
    optimize(Rest, NEnv, RestResidual).

Just like partial evaluation! It even reuses the helper functions presolve from the partial evaluator and a partial environment PEnv. When the arguments of the operation are known constants in the partial environment, the operation can be executed at optimization time and removed from the trace. Otherwise, the operation has to stay in the output trace. The result variable (as in the partial evaluator) needs to be removed from the partial environment, because it was just overwritten by an unknown result.

Now we need to deal with guards in the trace.

optimize(guard_true(V, [], L, Rest), PEnv, NewOp) :-
    plookup(V, PEnv, Val),
    (Val = const(C) ->
        NewOp = RestResidual
    ;
        NewOp = guard_true(V, PEnv, L, RestResidual)
    ),
    optimize(Rest, PEnv, RestResidual).

optimize(guard_false(V, [], L, Rest), PEnv, NewOp) :-
    plookup(V, PEnv, Val),
    (Val = const(C) ->
        NewOp = RestResidual,
        NEnv = PEnv
    ;
        write_env(PEnv, V, 0, NEnv),
        NewOp = guard_false(V, PEnv, L, RestResidual)
    ),
    optimize(Rest, NEnv, RestResidual).

When the variable that is being guarded is actually known to be a constant, we can remove the guard. Note that it is not possible that the guard of that constant fails: The tracer recorded the operation while running with real values, therefore the guards have to succeed for values the optimizer discovers to be constant.

guard_false is slightly different from guard_true: after the former we know that the argument is actually 0. After guard_true we only know that it is not equal to zero, but not which precise value it has.

Another point to note in the optimization of guards is that the second argument of the guard operation, which was so far always just an empty list, is now replaced by the partial environment PEnv. I will discuss further down why this is needed.

Optimizing guard_value is very similar, except that it really gives precise information about the variable involved:

optimize(guard_value(V, C, [], L, Rest), PEnv, NewOp) :-
    plookup(V, PEnv, Val),
    (Val = const(C1) ->
        NewOp = RestResidual,
        NEnv = PEnv
    ;
        write_env(PEnv, V, C, NEnv),
        NewOp = guard_value(V, C, PEnv, L, RestResidual)
    ),
    optimize(Rest, NEnv, RestResidual).

This operation is the main way how the optimizer gains constant variables that it then exploits to do constant-folding on later operations. This is a chief difference from partial evaluation: There the optimizer knows the value of some variables from the start. When optimizing traces, at the beginning the value of no variable is known. Knowledge about some variables is only later gained through guards.

Now we are missing what happens with the loop statement. In principle, it is turned into a loop statement again. However, at the loop statement a few additional operations need to be emitted. The reason is that we optimized away operations and thus assignments when the result value of the variable was a constant. That means the involved variable still potentially has some older value. The next iteration of the loop would continue with this older value, which is obviously wrong. Therefore we need to emit some assignments before the loop statement, one per entry in the partial environment:

optimize(loop, PEnv, T) :-
    generate_assignments(PEnv, T).

generate_assignments([], loop).
generate_assignments([Var/Val | Tail], op1(Var, same, const(Val), T)) :-
    generate_assignments(Tail, T).

As an example of how generate_assignments assignments works, let's look at the following example. When the partial environment is, [x/5, y/10] the following assignments are generated:

?- generate_assignments([x/5, y/10], Out).
Out = op1(x, same, const(5), op1(y, same, const(10), loop)).

That's all the code of the optimizer. While the basic structure is quite similar to partial evaluation, it's a lot less complex as well. What made the partial evaluator hard was that it needs to deal with control flow statements and with making sure that code is reused if the same block is partially evaluated with the same constants. Here, all these complexities go away. The tracer has already removed all control flow and replaced it with guards and one loop operation at the end. Thus, the optimizer can simply do one pass over the operations, removing some (with some extra care around the loop statement).

With this machinery in place, we can optimize the trace from the promotion example of the last post:

?- optimize(
    guard_value(x,3,[],b2,
    op2(x2,mul,var(x),const(2),
    op2(x3,add,var(x2),const(1),
    op2(i,sub,var(i),var(x3),
    op2(c,ge,var(i),const(0),
    guard_true(c,[],l_done, loop)))))),
    [],
    LoopOut).
LoopOut = guard_value(x, 3, [], b2, op2(i, sub, var(i), const(7), op2(c, ge, var(i), const(0), guard_true(c, [x/3, x2/6, x3/7], l_done, op1(x, same, const(3), op1(x2, same, const(6), op1(x3, same, const(7), loop)))))))

More readably, the optimized version is:

guard_value(x, 3, [], b2,
op2(i, sub, var(i), const(7),
op2(c, ge, var(i), const(0),
guard_true(c, [x/3, x2/6, x3/7], l_done,
op1(x, same, const(3),
op1(x2, same, const(6),
op1(x3, same, const(7),
loop)))))))

As intended, the operations on x after the guard_value have all been removed. However, some additional assignments (to x, x2, x3) at the end have been generated as well. The assignments look superfluous, but the optimizer does not have enough information to easily recognize this. That can be fixed, but only at the cost of additional complexity. (A real system would transform the trace into static single assignment form to answer such questions.)

Resuming to the Interpreter

Why does the code above need to add the partial environment to the guards that cannot be optimized away? The reason is related to why we needed to generate assignments before the loop statement. The problem is that the optimizer removes assignments to variables when it knows the values of these variables. That means that when switching back from running the optimized trace to the interpreter, a number of variables are not updated in the environment, making the execution in the interpreter incorrect.

In the example above, this applies to the variables x2 and x3. When the second guard fails, they have not been assigned in the optimized case. Therefore, the guard lists them and their (always constant) values.

When switching back these assignments need to be made. Thus we need to adapt the resume_interp function from the last blog post as follows:

write_resumevars([], Env, Env).
write_resumevars([Key / Value | Rest], Env, NEnv) :-
    write_env(Env, Key, Value, Env1),
    write_resumevars(Rest, Env1, NEnv).

resume_interp(Env, ResumeVars, L) :-
    write_resumevars(ResumeVars, Env, NEnv),
    block(L, Block),
    interp(Block, NEnv).

On resuming, the ResumeVars (a former partial environment) are simply added back to the normal environment before going back to the interpreter.

The data attached to guards about what needs to be done to resume to the interpreter when the guard fails is often a very complex part of a tracing system. The data can become big, yet most guards never fail. Therefore, most real systems try hard to compress the attached data or try to share it between subsequent guards.

Summary

In this post we have shown how to optimize traces by applying a variant of the partial evaluation principle: Perform all the operations that have only constant arguments, leave the others alone. However, optimizing traces is much simpler, because no control flow is involved. All the questions about control flow have already been solved by the tracing component.

In the next and final post of the series I will show a larger example of how tracing and partial evaluation can be used to optimize a small bytecode interpreter.

When I first created SQLAlchemy, I knew I wanted to create something significant. It was by no means the first ORM or database abstraction layer I'd written; by 2005, I'd probably written about a dozen abstraction layers in several languages, including in Java, Perl, C and C++ (really bad C and even worse C++, one that talked to ODBC and another that communicated with Microsoft's ancient DB-LIB directly). All of these abstraction layers were in the range of awful to mediocre, and certainly none were anywhere near release-quality; even by late-90's to early-2000's standards. They were all created for closed-source applications written on the job, but each one did its job very well.

It was the repetitive creation of the same patterns over and over again that made apparent the kinds of things a real toolkit should have, as well as increased the urge to actually go through with it, so that I wouldn't have to invent new database interaction layers for every new project, or worse, be compelled by management to use whatever mediocre product they had read about the week before (keeping in mind I was made to use such disasters as EJB 1.0). But at the same time it was apparent to me that I was going to need to do some research up front as well. The primary book I used for this research was Patterns of Enterprise Archictecture by Martin Fowler. When reading this book, about half the patterns were ones that I'd already used implicitly, and the other half were ones that I was previously not entirely aware of.

Sometimes I read comments from new users expressing confusion or frustration with SQLAlchemy's concepts. Maybe some of these users are not only new to SQLAlchemy but are new to database abstraction layers in general, and some maybe even to relational databases themselves. What I'd like to lay out here is just how many of POEAA's patterns SQLAlchemy is built upon. If you're new to SQLAlchemy, my hope is that this list might help to de-mystify where these patterns come from.

These links are from Catalog of Patterns of Enterprise Architecture.

  • Data Mapper - The key to this pattern is that object-relational mapping is applied to a user-defined class in a transparent way, keeping the details of persistence separate from the public interface of the class. SQLAlchemy's classical mapping system, which is the usage of the mapper() function to link a class with table metadata, implemented this pattern as fully as possible. In modern SQLAlchemy, we use the Declarative pattern which combines table metadata with the class' declaration as a shortcut to using mapper(), but the persistence API remains separate.
  • Unit of Work - This pattern is where the system transparently keeps track of changes to objects and periodically flushes all those pending changes out to the database. SQLAlchemy's Session implements this pattern fully in a manner similar to that of Hibernate.
  • Identity Map - This is an essential pattern that establishes unique identities for each object within a particular session, based on database identity. No ORM should be without this feature, as working with object structures and applications of the most moderate complexity is vastly simplified and made more efficient with this pattern in place.
  • Metadata Mapping - this chapter in the book is where the name MetaData comes from. The exact correspondence to Fowler's pattern would be the combination of mapper() and Table.
  • Query Object - Both the ORM Query and the Core select() construct are built on this pattern.
  • Repository - An interface that serves as the gateway to the database, in terms of object-relational mappings. This is the SQLAlchemy Session.
  • Lazy Load - Load a related collection or object as you need it. SQLAlchemy, like Hibernate, has a lot of options in how attributes can load things.
  • Identity Field - Represent the primary key of a table's row within the object that represents it.
  • Foreign Key Mapping - Database foreign keys are represented using relationships in the object model.
  • Association Table Mapping - A class can be mapped that represents information about how two objects are related to each other. Use the Association Object for this pattern.
  • Embedded Value - a value inline on an object represents multiple columns. SQLAlchemy provides the Composite pattern here.
  • Serialized LOB - Sometimes you just want to stuff all the objects into a BLOB. Use the PickleType or roll a JSON type.
  • Inheritance Mappers - Represent class hierarchies within database tables. See Inheritance Mapping.
  • Optimistic Offline Lock - Set up a version id on your mapping to enable this feature in SQLAlchemy.

Thanks for reading!

08:09

Drawing inspiration from this blog post on title virality I wanted to investigate what makes these top 10,000 titles the best of their breed. Which are the best superlatives? Who/what’s the most popular subject? Let’s start with some statistics:

  • On Feb. 03, 14:10:45 (UTC) the all-time top 10,000 submissions on reddit (/r/all) had a total of 82,751,429 upvotes and 62,655,532 downvotes (56.9% liked it).
  • 5.2 years between the oldest and newest submission
  • 8,331,382 comments. That’s about 833 comments per submission.
  • The #1 post has 26,758 – 4,882 = 21,876 points
  • The #10,000 post has 15,166 - 13,679 = 1,487 points
  • And now some graphs….

Adjectives – reddit loves “new”, “old”, “good” and “right”

Adjectives

Top Adjective, Superlative – “Best” is the best

Questions reddit loves how?

Questions

What’s reddit talking about? People.

Or news, the president, man…

Reddit appreciates personal content about you, this, it and I.

Even NLTK doesn’t understand these…

I’m pretty sure you don’t need example links for these…

The top 10,000 seem to come mostly from 17:00 UTC and rarely from around 12:00 UTC

This isn’t exactly the probability of succeeding to hit the front page as it’s not clear at what time submission count is highest. But it’s something.

An apology

This is my first time using NLTK and though I’m ok at coding I most certainly have no idea how to parse natural language. Here’s hoping this was somewhat insightful.

I have no idea what I'm doing

Appendix


We’ve just released a second bugfix update for PyCharm 2.0, version 2.0.2.

The update includes a number of Django specific fixes and minor features, improvements for the debugger and some important fixes in the IDE platform. Check out the full release notes.

As usual, the new version is available for download from the JetBrains site.

And if you’re looking forward to new features and not just bugfixes, stay tuned — the Early Access for PyCharm 2.1 is coming soon!

The multiprocessing module includes a generic Process class, which can be used to wrap a simple function.

The function must be design to work with Queues or Pipelines or other synchronization techniques.

There's an advantage, however, to defining a class which gracefully handles generator functions.  If we have Generator-Aware multi-processing, we can (1) write our algorithms as generators and then (2) trivially connect Processes with Queues to improve scalability.

We're looking at creating processing "pipelines" using Queues.  That way we can easily handle multiple-producer and multiple-consumer (fan-in, fan-out) processing that enhances concurrency.

We have three use cases:  Producer, Consumer and Consumer-Producer.

Producer

A Producer gets data from somewhere and populates a queue with it.  This is the source that feeds data into the pipeline.


class ProducerProcess( Process ):
"""Produces items into a Queue.

The "target" must be a generator function which yields
pickable items.
"""
def __init__( self, group=None, target=None, name=None, args=None, kwargs=None, output_queue=None, consumers=0 ):
super( ProducerProcess, self ).__init__( name=name )
self.target= target
self.args= args if args is not None else []
self.kwargs= kwargs if kwargs is not None else {}
self.output_queue= output_queue
self.consumers= consumers
def run( self ):
target= self.target
for item in target(*self.args, **self.kwargs):
self.output_queue.put( item )
for x in range(self.consumers):
self.output_queue.put( None )
self.output_queue.close()


This class will wrap a "target" function which must be a generator.   Every value yielded is put into the "output_queue".  When the source data runs out, enough sentinel tokens are put into the queue to satisfy all consumers.

Consumer

A Consumer gets data from a queue and does some final processing.  Perhaps it loads a database, or writes a file.  It is the sink that consumes data on the pipeline.


class ConsumerProcess( Process ):
"""Consumes items from a Queue.

The "target" must be a function which expects an iterable as it's
only argument. Therefore, the args value is not used here.
"""
def __init__( self, group=None, target=None, name=None, kwargs=None, input_queue=None, producers=0 ):
super( ConsumerProcess, self ).__init__( name=name )
self.target= target
self.kwargs= kwargs if kwargs is not None else {}
self.input_queue= input_queue
self.producers= producers
def items( self ):
while self.producers != 0:
for item in iter( self.input_queue.get, None ):
yield item
self.producers -= 1
def run( self ):
target= self.target
target( self.items(), **self.kwargs )


This class will wrap a "target" function which must be ready to work with any iterable.  Every value from the queue will be provided to the target function for processing.  When enough sentinel tokens have been consumed from producers, it terminates processing.

Consumer-Producer

The middle of a processing pipeline is consumer-producer processes which consume from one queue and the produce to another queue.


        
class ConsumerProducerProcess( Process ):
"""Consumes items from a Queue and produces items onto a Queue.

The "target" must be a generator function which yields
pickable items and which expects an iterable as it's
only argument. Therefore, the args value is not used here.
"""
def __init__( self, group=None, target=None, name=None, kwargs=None, input_queue=None, producers=0, output_queue=None, consumers=0 ):
super( ConsumerProducerProcess, self ).__init__( name=name )
self.target= target
self.kwargs= kwargs if kwargs is not None else {}
self.input_queue= input_queue
self.producers= producers
self.output_queue= output_queue
self.consumers= consumers
def items( self ):
while self.producers != 0:
for item in iter( self.input_queue.get, None ):
yield item
self.producers -= 1
def run( self ):
target= self.target
for item in target(self.items(), **self.kwargs):
self.output_queue.put( item )
for x in range(self.consumers):
self.output_queue.put( None )
self.output_queue.close()


This class will wrap a "target" function which must be a generator function that consumes an iterable.
Every value from the queue is provided to the target generator.  Every value yielded by the generator is sent to the output queue.  The input side counts sentinels to know when to stop.  The output side produces enough sentinels to alert downstream processes.

Target Functions

A producer function must be a generator function of this form


def prod( *args ):
for item in some_function(*args):
yield item


A consumer function looks like this:


def cons( source ):
for item in source:
final_disposition(item)


Finally, a consumer-producer function looks like this.

def cons_prod( source ):
for item in source:
next_value= transform(item)
yield next_value


These functions can be tested and debugged like this.


for final in consumer( cons_prod( producer( *args ) ) ):
print( final )


That way we're confident that our algorithm is correct before attempting to scale it with multiprocessing.


03:45

Like other Content Management Systems (CMS), Plone allows users to easily create and manage online content through the web by leveraging content types such as pages, blog posts, news items, calendar events, comments, etc. Those content types are typically custom-developed by programmers and require a formal code release to add the new types to a website.

And now Plone goes even farther: the Dexterity add-on allows non technical content contributors to not only use pre-existing content types, but to also define and create their own content types through the web (see screenshot).

Dexterity InterfaceUsing Legos as an analogy, this means users can do more than just create a structure based on a given number of basic bricks: they can also design and use their own custom bricks.

Dexterity is Plone 4 compatible and the Six Feet Up development team is using it on most of our projects. It has several advantages over the classic "Archetype" content types, but the main improvement lies in the ability users now have to create content types in a WYSIWYG fashion. Many of our clients appreciate the fact that they can create their own content types, or tweak them on their own without having to ask for a developer's help every time they want to add or arrange a simple field.

Dexterity is also very developer friendly. For example, it is easy for a developer to export the content types created by a content contributor via Dexterity and reintegrate them into the code for optimal site efficiency. This also makes the content types data-independent and easily accessible to other web development projects.

With Dexterity, Plone now closes the feature gap with other well-known open source CMSs such as Wordpress and Drupal, which already offer this ability.

02:15

We have to start somewhere. Something has to come first.

In 2001, I started working for the BBC in Cardiff. I worked alongside journalists and project managers for four years on all manner of web sites and applications; ranging from small niche content sites about surfing, through to redesigns of the homepage. All of these projects were approached Content First (but not this Content First), but not one of them had the content, first.

Having worked alongside news and media organisations for the past ten years, I’ve absorbed a lot about the editorial processes across three media: television, print and the web (and a bit of radio, too). I’ve rejoiced at the commonalities and grimaced with pain at the differences between how things are done; both in the organisations themselves, and the different output they produce. Some of those things we’ve taken straight to the web with some success. Others, we’ve tried and failed. Like trying to make websites like CD-Roms. Remember those?

The model that we took right at the birth of the web from print – the templated page and publishing system – is now under attack. It’s under attack from the premise that you need to know your content before you can design it. For anyone who’s worked in publishing, or had to design a highly scalable branding system, or a wayfinding system will know that is nonsense. You don’t need Content First. You need Structure First. Then you need Content all of the time.

Let’s be really clear about this. It is unrealistic to write your content – or ask your client to write the content – before you design it. Most of the time. Content needs to be structured and structuring alters your content, designing alters content. It’s not ‘content then design’, or ‘content or design’. It’s ‘content and design’.

Designing a magazine or newspaper system requires the designer to exercise rigorous restraint with a hugely variable melting pot of content. Working directly with an editorial team, you have to define what types of content you need to design for. News articles, opinion pieces, features – all of these require slightly different design treatment to communicate they are a different type of content to the reader. Design variance must be limited. Why? Well, time mostly. Newspapers and magazines run on incredibly tight timescales and content is literally pored into a mould – trimmed and teased a little, but the templates leave little room for movement. This is not bad practice. This is how content lives and breathes in a fast-moving editorial environment. It has to be fluid. It can’t be locked down early. Content First would not work here.

Content as structure

There is an emerging fallacy in our industry recently. The idea that you cannot create good design without knowing your content. Even I said that a while ago. But, that’s only half true. Newspapers, magazines and many other periodicals and publications in different media prove that assumption wrong every day.

You can create good experiences without knowing the content. What you can’t do is create good experiences without knowing your content structure. What is your content made from, not what your content is. An important distinction.

So you have to start with the structure not the words. What exactly is an opinion piece. What are the variables? Can we even define them? Images? How many? (to which, the response is always: ‘I don’t know, it depends!’). We can design around fluidity, but it means letting go of control. Again. How do we do that, then? How do we design around the fluidity? Well, we define structure; of our content, and the templates that content inhabits. We define the rules of the system to display the content in different ways (if we can) to help the reader understand the content better. As Erik Spiekermann says: ‘we give content form’.

Designers as Content Directors

Designers have always been involved with content. We’re not just concerning ourselves with what is visual. So how can we help our clients understand that when we say:

‘Content First!’

We don’t really mean:

‘I’m going to sit on my hands right here unless you give me my content. Finalised, proofed and signed off. Thank you very much.’

No, we don’t mean that. What we mean is:

‘We’d really like to understand the type and structure of the content for this project. Don’t worry, you don’t have to write anything yet, just help us understand.’

We can do this any number of ways, but recently I’ve found this broad process works for me:

1. Talking. And Post It notes. Get a room – preferably lock the door – and talk about the website or application. Not the content (yet). As you’re talking, jot down words that resinate. Then after you’ve talked, stick all the words up on a wall and start to make connections between. Some probably some fancy UX term for this exercise, not sure what.

2. Messy Mess. You should end up with a pile of loosly related words (not content at this point) about some stuff. This stuff is the very DNA of the website. The feel, the tone, the brand, the message through to the nitty gritty of content types, taxonomies, tags and technology. Now you need to sort it all out.

3. Iterate. At some point in that sorting, you will need to start tightening up the structure. Again, I’ve found doing this iteratively and collaboratively the most fruitful.

4. Structure. Now you should have some structure. At some point in step three, there will come a point when you or your client will say: ‘Yes, but what is this?’ pointing at a post note with a word on it. At that point, you get into detail and you start fleshing out what it is. This is defining the structure of your content. And it doesn’t stop here.

5. Page tables. A page table is basically a form for your content. Fill it in. Use this one from Relly, as it’s brilliant. This tool can really help your client later in the ‘oh my God, I’ve got so much content to write and I don’t know where to start!’ phase. It helps focus.

Structure First. Content Always.

Let’s stop siloing content, shall we? We did it for a while when we were designing a largely brochureware, templated web. Now, we’re trying to move that silo from one end of the process to the other. Let’s focus on structure to begin with, and think about content all the time. There is a symbiotic relationship between content and design. One cannot thrive without the other.

Let’s start with structure. Let’s know what our content is made from. Not, necessarily, what it is.

Monday, 06 February

23:09


I've pretty much finished replacing Kid with Genshi for directdocs (the project that generates the PyOpenGL documentation by combining the upstream docbook files with the pydoc-like introspection of PyOpenGL, as well as generating the OpenGLContext tutorial files).  Was pretty much painless. Caught for a bit on ${ [ do(x) for x in y] } operation not working as expected where do() is a py:def function; was returning generator objects in Genshi instead of producing text.  A few py:for tags solved that. Also spent a bit figuring out how to copy lxml-produced ETree nodes directly into the result;

<?python
from genshi.input import ET
?>
and then insert the elements with ${ET(element)}.

In our first post on Selenium in Python, we saw how to prepare your continuous integration environment on Debian. Now it's time to have a look on how to add Selenium tests in your existing web project.

Sample project

This tutorial is based on the poll application from the Django tutorial. It describes all the required steps to run Selenium tests on a Django project. The source code can be found here.

This sample project is organized as follow:


The following features are implemented:
  • Poll management via the Django admin site,
  • Vote submission.


Don't forget to install Django!
pip install Django

Testing tools

To write and execute Selenium tests, the following tools are used:

Install them all by typing:
pip install selenose djangosanetesting CherryPy coverage

Configure tests

The test configuration is located in tests/setup.cfg and will be loaded by nose at runtime:
[nosetests]
with-xunit = true
with-coverage = true
cover-package = djangotutorial,polls
with-django = true
with-cherrypyliveserver = true
with-selenium-server = true
with-selenium-driver = true
  • with-xunit generates test result reports,
  • with-coverage and cover-package generate coverage reports for the djangotutorial and polls packages,
  • with-django enables djangosanetesting which setup database for instance,
  • with-cherrypyliveserver starts a CherryPy server on djangotutorial for Selenium tests,
  • with-selenium-server starts a Selenium Server,
  • with-selenium-driver provides a Selenium Web Driver to the tests.

The test database is configured in djangotutorial/settings.py with TEST_DATABASE_NAME (used by djangosanetesting for live server) and TEST_NAME:
TEST_DATABASE_NAME = 'djangotutorialtest.db'
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.sqlite3',
# Other configuration
# ...
'TEST_NAME': TEST_DATABASE_NAME,
}
}

Write tests

Selenium tests are located in the tests folder. Here is a test template:
from selenose.cases import SeleniumTestCase

from djangosanetesting.cases import HttpTestCase

class SampleTestCase(SeleniumTestCase, HttpTestCase):

def test(self):
self.driver.get('http://localhost:8000/')
# Your test here...

This test inherits from:
  • selenose.cases.SeleniumTestCase to flag this test as a Selenium test for the Selenium Driver Plugin of selenose (the one providing a Web Driver),
  • djangosanetesting.cases.HttpTestCase to flag this test as an HTTP test for djangosanetesting, to start the CherryPy server.

Then write tests as usual, with the benefit of having the self.driver directly available.

The following sections contain the test examples of two major features:
  • vote submission,
  • login on admin site.

Test vote submission

The project defines a sample poll How is selenose? in the djangotutorial/polls/fixtures/initial_data.json fixture which is loaded at database initialization.


This test verifies that the voting process is working for this poll:
class PollsTestCase(SeleniumTestCase, HttpTestCase):

def test_vote(self):
self.driver.get('http://localhost:8000/polls/')
poll = self.driver.find_element_by_link_text('How is selenose?')
poll.click()
time.sleep(2) # Should use accurate WebDriverWait
choices = self.driver.find_elements_by_name('choice')
self.assertEquals(2, len(choices))
choices[1].click()
choices[1].submit()
lis = self.driver.find_elements_by_tag_name('li')
self.assertEquals(2, len(lis))
self.assertEquals('Cool? -- 0 votes', lis[0].text)
self.assertEquals('Super cool? -- 1 vote', lis[1].text)
  • First open the page listing the polls: http://localhost:8000/polls/,
  • Search the link pointing to the How is selenose? poll, and click on it,
  • Verify that two choices are available, then select the last one before submitting the form,
  • Finally verify the vote results on the poll result page.

Test admin site login

The project also defines an administrator admin/admin in the djangotutorial/polls/fixtures/initial_data.json fixture which is loaded at database initialization.


This test checks that the admin user can log into the admin site:
class AdminTestCase(SeleniumTestCase, HttpTestCase):

def test_login(self):
self.driver.get('http://localhost:8000/admin/')
self.driver.find_element_by_id('id_username').send_keys('admin')
password = self.driver.find_element_by_id('id_password')
password.send_keys('admin')
password.submit()
time.sleep(2) # Should use accurate WebDriverWait
self.assertTrue(self.driver.find_element_by_id('user-tools').text.startswith('Welcome'))
  • First open the administration interface login page: http://localhost:8000/admin/,
  • Then look for the field dedicated to username (has an id id_username) and type admin,
  • Then look for the field dedicated to password (has an id id_password) and type admin before submitting the form,
  • Finally assert that the user is welcomed.

Run tests

The script tests/run.py is the entry point to run the tests. It:
  • Adds / (folder of the project) and /djangotutorial (for polls module) in the sys.path,
  • Exports the DJANGO_SETTINGS_MODULE in os.environ to define the Django settings module to use in tests,
  • Changes the current directory to the tests folder,
  • Call nose.

To run all tests with the firefox Web Driver environment (see selenose documentation for more information on Web Driver environments), execute:
$ python tests/run.py --selenium-driver=firefox

Or to run a single test:
$ python tests/run.py --selenium-driver=firefox test_admin.py

What's next?

In our next blog post, how to integrate this project with Jenkins!

Decorators are useful in Python for a variety of reasons. Some common uses are:

  • caching
  • logging
  • timing
  • authorization
  • and lots more …

I mentioned in my recent SCALE10x talk that I would have a handout/cheatsheet for Python decorators. Here is a pdf version of the Decorator cheatsheet. It has templates for normal decorators and parameterized decorators. Enjoy!

edit — I apologize for the typos uploaded previously. I ran the handout through doctest and it should be good to go now.

19:45

If you have been reading this blog over the past five years I am sure you have read a post or five about my desire to bootstrap importlib into Python as the implementation of __import__. Well, as of today I'm willing to say that the difficult technological hurdles have been scaled! At this point the only thing holding me back from taking my code from https://hg.python.org/sandbox/bcannon#bootstrap_importlib and making importlib drive import statements are some small compatibility issues, integrating into the build process better, a code review, and python-dev sign-off. In other words all of the interesting problems have been solved, so I'm finally ready to write a blog post discussing how I pulled off what I have.

So how exactly do you import __import__? To begin, as with any bootstrap challenge, you need to figure out what is available to you so you know what your design parameters are. In my case I knew I couldn't import anything that required filesystem access since half of import is handling the search for a module (the other half is the actual importing); if I wanted to import a file I would need to essentially write half of import in C to work properly. This restriction also has unexpected side-effects, e.g. you can't rely on open() because that is part of the io module which is a Python module.

That meant I could only rely on built-in modules. If you run sys.builtin_module_names you will discover what is available directly within the CPython binary. The question then becomes if that is enough? It turns out that yes, those built-in modules are enough to perform an import. OK, so you know you have the bare minimum modules required to do an import, but how the heck do you get the built-in modules into the global scope of the module that imports module since you can't use an import statements?

This is when Python's dynamism comes in handy. Since the import statement doesn't do much more than pull in the module object and assign it to a variable at the global scope of the module, I just needed to get the module object for importlib and assign to its __dict__ the built-in modules I needed. Turns out that sys and imp are enough to allow importlib to handle the import of the rest of the built-in modules needed for import to work, so that kept this bit of code short.

But this brings up the next quandry: how do I create a module object of importlib? If I end up searching for importlib on sys.modules then I would have ended up implementing a decent chunk of import itself. So how could I get the module object? This is when frozen modules comes into play.

A frozen module is just a C array containing the marshaled code for a module (which is what a .pyc file is sans magic number, timestamp, and now file size of the source). Since marshal is a built-in module then frozen modules can be loaded without issue. That means you can load a frozen module without using import (much like importing built-in modules).

And that is all of the parts needed to import importlib w/o import. =) To summarize, you get importlib set as __import__ by doing the following:

  1. Import the frozen module (i.e. read in a C array of a marshaled module object and unmarshal it)
  2. Import sys and imp (built-in modules, so done in C code by calling key C functions which return module objects) and set it on the module object
  3. Call Python code to import the rest of the built-in modules using sys and imp
  4. Set Python-based __import__ on the builtins module
And voila! __import__ ends up implemented in pure Python code. Now I just need to clean up the code, fix the compatibility issues, rip out the old C code, and get python-dev to sign off. =) Hopefully I will get far enough I will have a lightning talk at PyCon with benchmark numbers to show this is actually all a good thing (including ripping out a ton of C code, especially if I can re-implement chunks of imp in pure Python =).

Happy New Year

Welcome back to Python4Kids! I am sorry for being a little slack since the end of last year, but hopefully I’m over that now and we can start powering on again!

Towards the end of last year we started working on classes and a GUI toolkit called Tkinter.   In this tutorial we will recap classes.  Hopefully, we’ll recap Tkinter next.  If you have data and functions which are related to each other in some way, classes allow you to group them together.  This makes managing them easier, particularly as your programs get bigger.

Example:

>>> class allDads(object):                                                                                                                                           
...    def __init__(self,age=28):                                                                                                                             
...       self.age = age                                                                                                                                                 
...                                                                                                                                                                                                                                                                                                                             
>>> dad1 = allDads()
>>> dad1.age
28                                                                                                                                                                       
>>> dad2 = allDads(35)                                                                                                                                                   
>>> dad2.age
35                        
>>> dad1.age
28

This class has one method (called __init__()) and one attribute (called age).  It inherits from object.

Recap bit

The main things we have learned about classes are:

  • they are based on (“inherit from”) some Python object (usually object <- the italics here refers to the Python thing called object as opposed to object in its ordinary sense)
  • classes are defined with the class statement.  The class statement creates an archetype* – that is, a kind of template – from which specific instances are created.   Thus the class allDads was representing all dads in the world, while myDad, was a specific instance of a dad – that is, my dad in particular.
  • data which is stored in a class or instance is called an attribute. 
  • functions which are part of a class or instance are called methods
  • if you have an instance of a class (eg myDad), then the attributes and methods of myDad are identified by the dot operator “.”.  So, the appearance attribute of the instance myDad is identified by myDad.appearance (see the dot?).  If it had a method called makesARobot() (the parenthesis indicates it is a function or method) that would be identified by myDad.makesARobot().
  • the attributes of an instance are (typically) specific to that instance.  If two instances have an attribute of the same name (which will always be the case for attributes defined by the class), changing the attribute for an instance does not affect the attribute of another instance.
  • classes usually define a special method called __init__() (“dunder init“, among other pronunciations).  This method is run whenever an instance of the class is created.  It INITialises the instance.
  • classes make use of special variable called selfself is a bit tricky to explain, but it will become clear when you start using it a bit.  Self is used when defining the class as a way for each instance to refer to that instance of the class, rather than to the class as a whole.

Python’s classes are a core component of the language.  You will end up using them all the time.  In fact, they are also a core part of object oriented programming.  I have introduced classes at the same time as Tkinter because Tkinter needs them.  In order to program Tkinter, you basically need to use classes!

Homework:

Part 1:

Create a class with a method called __init__(self, firstName = None).  Make the __init__ method assign the value of the firstName parameter to an attribute which is also called firstName.  Hint: check the example above, and use self.firstName).

Part 2:

Create some instances of your class, passing your first name as a string (ie put it in quotation marks) to the class. 

Part 3

Print the firstName attribute of the instances you have created.

Notes:

* an “archetype” is “a universally understood symbol or term or pattern of behavior, a prototype upon which others are copied, patterned, or emulated.


A friend asked on G+ recently about tabs vs. spaces. A lot of people agreed with what I said so I thought I’d turn it into a proper post.

There’s a good summary here: http://www.jwz.org/doc/tabs-vs-spaces.html. This is also a link Jeff Atwood has in his post on the subject.

So why are spaces preferred except tabs? Tabs have the nice feature of being both more compact, and the display of the code in an IDE can be customized (I prefer shorter indents, some prefer larger). Spaces are more verbose in a lot of ways. But I’m not going to go over pros and cons with using them because, frankly, they’re not the reason.

Spaces are preferable to tabs because, like the Zen of Python says, explicit is better than implicit. Explicit in the sense that it is more compatible in more places.

PEP8 tells us to limit lines to 79 columns, because our code may be running on fixed-width terminal windows, and python is a scripting language, so people would be looking at the actual code on those terminal windows. As opposed to compiled code, where you’re generally not going to look at or edit the code on those terminal windows.

Speaking of terminals. There are a lot of times we’re editing code in unfamiliar places. That’s not just something like a terminal window. It is an unfamiliar text editor. It is an editor embedded into some program. It is a diff tool. It is any number of places we may need to write or debug code outside of our primary editor/IDE. Who knows what happens when you hit ‘tab’? How are things configured? Why bother with the ambiguity?

Well nothing is stopping you from requiring tabs for your studio, and breaking python’s PEP rules, and educating and configuring everyone’s editors to use tabs. However, the first time you need to go in and edit some code you find on the internet or download through pip or easy_install, you’re going to screw up and create a syntax error. Not only that but nearly every IDE can be easily configured to use spaces instead of tabs for both indenting and dedenting. And where you’re not sure of the default, or don’t want to configure it, you can just use spaces and backspace.

So for python, there’s no reason to use tabs. Just don’t do it. You’re using a language that is dependent upon whitespace for code structure. You need to take it seriously and remember you and your code is part of the larger python community. It isn’t about preference, it is about compatibility.

If you aren’t using a whitespace-dependent language, feel free to establish a standard and enforce it. Just never do it with python.

18:00

A few months ago, the good folks at No Starch Press sent me a review copy of Chris Sanders' book Pra ...(more)...

17:09

TTFQuery has been updated to feel a tiny bit more modern.  It's still pretty old-school, but it now has some actual documentation, and should be more friendly for pip installation.

Found the bug that has been holding up my testing on PyOpenGL/OpenGLContext.  The shadow demos had been broken, which I considered a blocker.  Turns out not to have been related to the double->float conversion, instead, I left out the rotation axis normalization in the new transformation-matrix-calculation library. Duh!

Just ran into the little wonder that is "pip install PIL", which results in a useless PIL (i.e. one without JPEG, PNG or TrueType support, but which imports, so doesn't trigger the "I don't have PIL code paths").  There's other cases with e.g. PostgreSQL drivers, or complex GUI libraries, where if you don't happen to have a toolchain installed you are out-of-luck on the install, and really, you'd rather just use the .deb file version anyway.  What I'd really like is something like this:

pip install --link-system PIL psycopg2 pygame

where the 3 packages there would be looked up in the system Python corresponding to the Python in the virtualenv and links would be created from venv/lib/pythonX.Y/site-packages to the system-level packages (if it's more reliable, copy the files, I really don't mind).

Does something already do this elegantly/cleanly?  I have a stupid little script that works for the few libraries I have (basically just imports them, printing out the paths, which I then script to create the links), but I really want something where I can just add a flag (or whatever) to the requirements file so that the whole thing is transparent (save that if the library isn't installed at the system level the user gets told to install it manually (or something)).

12:54

The next meeting of the Python User Group Austria (PyUGAT) will take place this Sunday, February 12th, 2012 around 6 p.m. at the Metalab in Vienna (how to get there).

You can also find us on Meetup.

The Agenda for this evening can be found in our wiki. This time there will be a couple of small presentations. Bring your projects or questions to discuss with the crowd! Afterwards there is time for chit-chat and a beer.

If you happen to be in or around Vienna this Sunday, come and join us at the Metalab!

We can accommodate non German-speaking guests by switching to English, so don't be afraid to drop by!

You can also follow us on Twitter and Identica for up-to-date news. You can find logs of our previous meetings on our website.

See you on Sunday!

In this year's CUSL edition has been added a new option in the projects data. Now, in addition to the blog, the RSS feed and the project code, you can add a twitter account for your project.

I think that is a great idea of the contest organizers. Any free and open
source project needs build a rich and plural users community, and social networks are a good place to do it. Everyone working in open source projects know how difficult is to keep a project alive and we must take advantage of the social networks.

To keep up of participant projects news we've created this twitter list with all available projects accounts here. Members of the list are updated regularly to not miss anything :)

Still do not have a twitter account for your project? So what are you waiting for? Create an account and follow our list of participants!


08:27

It has been a while, but testtools 0.9.13 is finally out! Lots more matchers and bug fixes, as well as improved error reporting. Full release notes on Launchpad.

Thanks to James Westby, Graham Binns, Francesco Banconi and Robert Collins for making this release possible.

08:09

eGenix is pleased to announce the immediate availability of the eGenix mx Base Distribution 3.2.3 for Python 2.4 - 2.7.

Hacking Health is on February 24 and 25 in Montreal. Our goal is to pair health and technological entrepreneurs so that they can develop solutions to front-line healthcare problems.

By emphasizing hands-on work on small projects which can be tackled in a short period of time, participants can quickly test assumptions, build teams, and generate momentum for promising ideas in healthcare.

There’s a limited number of places and they are leaving very quickly: http://physiqa.wufoo.com/forms/hackers-form/

For more information or to sign up, visit http://www.hackinghealth.ca.

Today starts the countdown to a the start of new phase of my
career. After working at Racemi for just over ten years, I have
decided that it is time to move on to new challenges. Starting next
Monday, 13 February, I will be a "Senior Cloud Developer" for
Dreamhost.

Ten years is a long time to be with one group of developers, and
leaving Racemi was not an easy decision. I have been through a lot
with this team. I traveled to customer sites when the company was
small enough that we sent developers on sales calls. I learned a lot
about data center automation with them, and had an opportunity to work
with some unusual hardware that came before its time. Together we
created some crazy-cool product code that I could never have imagined
on my own. We work well together, even when we disagree, and because
of that, through it all, I enjoyed working with them. I am proud of
what we have built together and I hope their successful trend
continues. But the time has come for me to push myself in a different
direction.

The enthusiasm of the people I met while visiting Dreamhost's offices
in LA was infectious, and a big part of why I decided this move would
be right for me. I look forward to the opportunity to learn an entire
stack of new technologies and make major product design decisions on
some exciting new projects. As part of the work we will be doing, I
anticipate being able to contribute to more open source projects, too,
and will more about work-related projects as a result.

I am looking forward to joining Jonathan, Duncan, and the rest of
the talented team at Dreamhost!

05:00

Yes, sometimes I still use the command line for git (even with EGit getting better and an integration on Aptana Studio 3).

This is an update to a previous post about git (with msysgit).

[Update] October 31st, 2011: Improved gitdd command a bit
[Update] February 6th, 2012: Adedd command to preview incoming changes on a branch

1. Configure running shell

Usually I use TCC/LE as a shell, so, there's an alias to start the git bash shell from it (tcc allows editing the a 4START.BAT which will run when it's started, so, the configuration may be added there):

alias gitshell=path_to_git/bin/sh.exe --login -i

Other aliases:

alias git1=git log --graph --oneline --decorate

Shows the log with nice graph.

1a Use pageant/plink so that the password does not need to be entered all the time

Download pageant/plink, set environment variable GIT_SSH to the plink location, start pageant and load the private ssh key with pageant.

1b Global git configurations

git config --global core.autocrlf false
git config --global user.name "fabioz"

2. Fix history behavior

I find it annoying that the default history completion behavior (when up/down is used) is always getting the previous command instead of completing based on what's already there. To fix that, create a .inputrc file in your user home directory and set the following contents to it:

## By default up/down are bound to previous-history
## and next-history respectively. The following does the
## same but gives the extra functionality where if you
## type any text (or more accurately, if there is any text
## between the start of the line and the cursor),
## the subset of the history starting with that text
## is searched (like 4dos for e.g.).
## Note to get rid of a line just Ctrl-C
"\e[B": history-search-forward
"\e[A": history-search-backward

$if Bash
## F10 toggles mc on and off
## Note Ctrl-o toggles panes on and off in mc
"\e[21~": "mc\C-M"

##do history expansion when space entered
Space: magic-space
$endif

## Include system wide settings which are ignored
## by default if one has their own .inputrc
$include /etc/inputrc


3. Diff changes

For diffing the current changes with WinMerge, there's a file called gitdd (added to the git bin dir) with the contents below:

#!/bin/sh

# usage: gitdd
# Compares the current differences in winmerge with links to original files.
# Note that it must be executed at the root of the git repository.

SUBDIRECTORY_OK=1

O=".git-winmerge-tmp-$$"
V=HEAD
list="$O/list"
list_exist="$O/list_exist"
# Delete everything created here on exit
trap "rm -rf $O" 0
mkdir $O
mkdir $O/WORKINGCOPY
# Dump
git diff $V --name-only -z $1 > $list
# Create links to changed files inside temp folder
# (changes made to these links will be made to originals)
for i in `cat $list | xargs -0`; do
PPATH=`dirname $i`
mkdir -p $O/WORKINGCOPY/$PPATH
mkdir -p $O/HEAD/$PPATH
ln $(pwd)/$i $(pwd)/$O/WORKINGCOPY/$i
git show HEAD:$i > $O/HEAD/$i
done
# Copy HEAD versions of changed files to temp folder
# cat $list | xargs -0 git archive --prefix=HEAD/ $V | tar xf - -C $O
# Execute winmerge which must be on the system path
WinMergeU.exe //r //u //wr //dl WORKINGCOPY //dr HEAD $O/WORKINGCOPY $O/HEAD

3a. WinMerge tree view

I also always like to see things as a tree in WinMerge (Menu view > Mode: Alt+V M) and also expanded (Menu view > Expand All Subfolders: Alt+V X). It'll store the configuration to show as tree, but unfortunately it needs to be expanded all the times when doing a compare (i.e.: no auto-expand).

4. Show things in a compact way

git config format.pretty "%h %ct %ad %Cgreen%aN%Creset %s"
git config log.date short

5. The commands used to get the contents of a pull request (create local branch, merge, get dev, merge with dev):

git checkout -b PullFromBranch-dev development
git pull https://github.com/pull_from_user/Pydev.git PullFromBranch
git checkout development
git merge PullFromBranch-dev --no-commit --no-ff

Then, to accept the merge do a commit or to reject it do:
git merge --abort
or
git merge --reset


6. When creating a feature in a branch:

git checkout -b FeatureBranch-dev development
git checkout development
git pull FeatureBranch-dev --no-ff


7. Preview incoming changes:

Create a bash file with the contents below (usually just hardcoding the ${current-branch} to the development branch). The command below will create 2 branches from the current branch, update one of those, diff it, compare with external tool and revert changes again.

echo Saving local changes just in case... git stash apply may be done later!
git stash
git checkout -b preview-branch
git checkout -b preview-branch2
git pull origin ${current_branch}
git checkout preview-branch
git merge --no-ff --no-commit preview-branch2
gitdd
git reset --hard
git checkout ${current_branch}
git branch -D preview-branch
git branch -D preview-branch2


8. Common commands

git status
git commit -a -m "Message"
git push origin master
git checkout
git log -n 6
git log -n 3 --format=full
git commit --amend (change last commit message)
git show (see what happened on a commit)

And some of the commands I had to discover how to use in the msysgit bash are:

Alt+Space+E+K: mark contents for copy
Insert: paste contents
Alt+Backspace: erase until whitespace


Doing a merge:

Steps: create a new branch based on the current development, pull the merge request into that branch, then checkout the development branch, merge it there without commiting (to do a proper review) and then accept or reject the merge.

git checkout -b branch_to_do_merge_dev development
git pull https://github.com/user/Pydev.git jonahkichwacoders:branch_to_do_merge
git checkout development
git merge branch_to_do_merge_dev --no-commit --no-ff

Then, to accept the merge do a commit to reject it do:
git merge --abort
or
git merge --reset


During January, we hosted a contest for promoting the Kivy framework. The goal was simple: create a game using Kivy, without external non-pure python dependencies. All the entries have been submitted on our sponsor Github (and thanks to NotionInk), under a compatible OSS licence. The contest registered 21 entries, and 11 submissions was valid.

Our winners are:

  1. Deflectouch, from Cyril Stoller
  2. FishLife, from Zogg
  3. memoryKivy, from Niavlys
  4. Centripetal, from Dilon Cower
  5. Flingy, from Andy Wilson

In term of numbers, everything is growing:

  • 19919 unique visitors for the website (vs 9772 in December)
  • +56 subscribers on the mailing list (192 in total)
  • 229 messages on kivy-users (vs 94 in December)
  • 23 pull requests (vs 6 in December)

That was a great event so far, with a perfect timing: Kivy have now 1 year old! (first release was 1st February 2011).

02:00

We’re building a new service at Services called the Token Server – The idea is simple : give us a Browser ID assertion and a service name, and the Token Server will send you back a token that’s good for 30 minutes to use for the specific service.

That indirection makes our live easier to manage user authentication and resource allocation for our services . A few examples:

  • when a new user wants to use Firefox Sync, we can check which server has the smallest number of allocated users, and tell the user to go there
  • we can manage a user from a central place
  • we can manage a user we’ve never heard about before without asking her to register specifically to each service — that’s the whole point of Browser ID

I won’t get into more details because that’s not the intent of this blog post. But if you are curious the full draft spec is here - https://wiki.mozilla.org/Services/Sagrada/TokenServer

What’s this post is really about is how to build this token server.

The server is a single web service that gets a Browser ID assertion and does the following:

  1. verify the assertion
  2. create a token, which is a simple JSON mapping
  3. encrypt and sign the token

The GIL, Gevent, greenlet and the likes

Implementing this using Cornice and a crypto lib is quite simple, but has one major issue : the crypto work is CPU intensive, and even if the libraries we can use have C code under the hood, it seems that the GIL is not released enough to let your threads really use several cores. For example, we benched M2Crypto and it was obvious that a multi-threaded app was locked by the GIL.

But we don’t use threads in our Python servers — we use Gevent workers, which are based on greenlets. But while greenlets help on I/O bound calls, it won’t help on CPU bound work : you’re tied into a single thread in this case and each greenlet that does some CPU work blocks the other ones.

It’s easy to demonstrate — see http://tarek.pastebin.mozilla.org/1476644  If I run it on my Mac Book Air, the pure Python synchronous version is always faster (huh, the gevent version is *much* slower, not sure why..)

So the sanest option is to use separate processes and set up a messaging queue between the web service that needs some crypto work to be done and specialized crypto workers.

We’re back in that case to our beloved 100% I/O bound model we know how to scale using NGinx + GUnicorn + GEvent

For the crypto workers, we want it to be as fast as possible, so we started to look at Crypto++ which seems promising because it uses CPU-specific calls in ASM. There’s the pycryptopp binding that’s available to work with Crypto++ but we happen to need to do some tasks that are not available in that lib yet — like HKDF.

Yeah, at that point it became obvious we’d use pure C++ for that part, and drive it from Python.

Message passing

Back to our Token server — we need to send crypto work to our workers and get back the result. The first option that comes in mind is to use multiprocessing to spawn our C++ workers and to feed them with work.

The model is quite simple, but now that we have one piece in C++, it’s getting harder to use the built-in tools in multiprocessing to communicate with our workers — we need to be lower level and start to work with signals or sockets. And well, I am not sure what would be left of multiprocessing then.

This is doable but a bit of a pain to do correctly (and in a portable way.) Moreover, if we want to have a robust system, we need to have things like a hearbeat, which requires more inter-process message passing.   And now I need to code it in Python and C++

Hold on — Let me summarize my requirements:

  • inter-process communication
  • something less painful than signals or sockets
  • very very very fast

I got tempted by Memory Mapped Files, but the drawbacks I’ve read here and there scared me.

ZeroMQ

It turns out zeromq is perfect for this job – there are clients in Python and C++, and defining a protocol to exchange data from the Python web server to the crypto workers is quite simple.

In fact, this can be done as a reusable library that takes care of passing messages to workers and getting back results. It has been done hundreds of times, there are many examples in the zmq website, but I have failed to find any Python packaged library that would let me push some work to workers transparently, via a simple execute() call — if you know one tell me!.

So I am building one since it’s quite short and simple –  The project is called PowerHose and is located here : https://github.com/mozilla-services/powerhose.

Here is its descriptions/limitations:

  • Powerhose is based on a single master and multiple workers protocol
  • The Master opens a socket and waits for workers to register themselves into it
  • The worker registers itself to the master, provides the path to its own socket, and wait for some work on it.
  • Workers are performing the work synchronously and send back the result immediatly.
  • The master load-balances on available workers, and if all are busy waits a bit before it times out.
  • The worker pings the master on a regular basis and exits if it’s unable to reach it. It attempts several time to reconnect to give a chance to the master to come back.
  • Workers are language agnostic and a master could run heterogeneous workers (one in C, one in Python etc..)
  • Powerhose is not serializing/deserializing the data – it sends plain strings. This is the responsibility of the program that uses it.
  • Powerhose is not responsible to respawn a master or a worker that dies. I plan to use daemontools for this, and maybe provide a script that runs all workers at once.
  • Powerhose do not queue works and just rely on zeromq sockets.

The library implements this protocol and gives two tools to use it:

  • A JobRunner class in Python, you can use to send some work to be done
  • A Worker class in Python and C++, you can use as a base class to implement workers

Here’s an example of using Powerhose:

For the Token server, we’ll have:

  • A JobRunner in our Cornice application
  • A C++ worker that uses Crypto++

The first benches look fantastic — probably faster that anything I’d have implemented myself using plain sockets :)

I’ll try to package Powerhose so other projects at Mozilla can use it. I am wondering if this could be useful to more people, since I failed to find that kind of tool.  How do you scale your CPU-bound web apps ?


Filed under: mozilla, python

Sunday, 05 February

23:09

localshop - "really, really alpha" but promising local PyPI mirror / private repository. Yes, another one. This one might just be the one to meet my specific requirements though...

pytagcloud - is one to watch: make tag clouds as PNG images or HTML. Usage is a bit fiddly at the moment and I couldn't replicate the results they got. I think the key is having a good tag (interesting word) extractor. This bit of code might come in handy when experimenting with it:
import re
from roundup.backends.indexer_common import STOPWORDS
import requests, collections, bs4
soup = requests.get('http://www.python.org/about/').text
text = bs4.BeautifulSoup(soup).find('div', id='content-body').get_text()
counts = collections.defaultdict(int)
for word in re.split('\W+', text):
    if word.upper() not in STOPWORDS and len(word)>2:
        counts[word.lower()] += 1
words = sorted((count, word) for word, count in counts.items())
tags = [(word, count) for count, word in words[-30:]]

from pytagcloud import make_tags, create_tag_image
create_tag_image(make_tags(tags), 'cloud.png')
Sadly it doesn't quite work for me. I suspect something might up up with my pygame/platform's TTF support. I also had to add a Font object cache to stop it blowing up on my system (git pull request submitted :-)

slumber - call web RESTful (HTTP) APIs from Python code. Supports JSON, and YAML (with pyyaml installed) and is built on top of the awesome requests. While looking at slumber I picked up this tip for validating and pretty-printing JSON:
$ echo '{"json":"obj"}' | python -m json.tool
{
    "json": "obj"
}

Suppose you have a PostgreSQL database like the Pagila sample with 14 tables, each with a last_update timestamp column to record the date and time each row was modified, and it is now a requirement to capture which user effected each change. Or perhaps you have several tables without such audit trail columns and need to add them quickly. Or maybe you have decided to denormalize your design by adding a calculated column, e.g., extended price = unit price times quantity ordered, or a derived column, e.g., carrying the customer name in the invoice table.

If you have some experience as a DBA, the word “drudgery” may have come to mind at the prospect of implementing the above features. It’s possible that, after a while, you’ve developed an approach for dealing with some of them but still wish there’d be some way to automate these thankless tasks.

You may have looked at the Andromeda project’s “automations” which provide some of these capabilities. However, in order to take advantage of the automations, you’ll first have to manually describe your database in a YAML format (and you’ll have to install Apache and PHP). Or you could have tried to use the follow-on project, Triangulum, but essentially you’d still have to create a YAML schema (no need for Apache, but you still need PHP).

Some relief is forthcoming. As a result of discussions resulting from my Business Logic in the Database post, I have been collaborating with Roger Hunwicks on a potential solution to these common DBA needs. The new Pyrseas tool is tentatively named dbextend1 and its initial documentation is available in the Pyrseas extender branch. This is how I envision dbextend being used.

Consider the opening example. The DBA would create a simple YAML file such as the (abbreviated) one below, listing the tables and the needed features:

schema public:
  table actor:
    audit_columns: default
  table category:
    audit_columns: default
...
  table store:
    audit_columns: default

The DBA would then use this file, say audext.yaml, as input to dbextend, e.g.,

dbextend pagiladb audext.yaml

dbextend reads the PostgreSQL catalogs (using code shared with dbtoyaml and yamltodb), building its internal representation. It also reads the YAML extensions file and builds a parallel (albeit much smaller) structure. Thirdly, it reads extension configuration information, e.g., a definition of what columns need to be added for “audit_columns: default“, for example, modified_timestamp and modified_by_user, what trigger(s) to add, and what function(s) to be created.

The output of dbextend is a YAML schema file, just like the one output by dbtoyaml, which can be piped directly to yamltodb to generate SQL to implement the desired features.

In case you’re wondering, dbextend —like other Pyrseas tools— will require Python, psycopg2 and pyyaml.

What features would you like to see automated? What are your suggested best practices for automating these common needs?


Picture credit: Thanks to Mr. O’Brien, a fourth-grade teacher in Minnesota.

1 We’re still receptive to some other suitable name.


Filed under: Database tools, PostgreSQL, Python

21:09



“The Cleveland Tourism Board gave me 14 million dollars about 8 months ago to make a promotional video to bring people to Cleveland. As usual, I waited till the last minute and I ended up having to shoot and edit it in about an hour yesterday afternoon.” — bishopvids

20:09

MongoDB Logo

This Thursday in NYC I’m talking about Python, MongoDB, and asynchronous web frameworks at a meetup called For the Love of Python: Wine tasting, Red velvet cupcakes, and Tech Talks. The talk is a work in progress. To be strictly accurate, I have not yet started working on the talk, because the code I’ll be talking about is itself a work in progress. But come anyway, because I’ve been thinking a lot on this subject for the last few months, and I intend to present:

  • A high-level discussion of what an async web framework is and when you need it, or don’t. I think there’s a lot of sloppiness on this subject, and I want to work with the audience on tightening up our thinking.
  • A review of pymongo, pthreads, Tornado, asyncmongo, and gevent. You won’t be disappointed.
  • For the first time ever, I will present an exclusive sneak-peak at my own experimental Python driver for MongoDB and Tornado, built on top of the official pymongo driver. It’s pretty snazzy, it uses greenlets, and it’s an example of a general pattern for asynchronizing synchronous database drivers that might inspire you to write your own database driver in Python. Buckle your seatbelts, we’re going deep.

    PyATL is having a Jam Session this Tuesday at 7PM, for those in the area and interested. In addition to a presentations for getting started with a few web frameworks (Django, Bottle), there will be some hackage on various projects. I'll be there, looking to help people with, or work on boto.

    If you're interested in coming, RSVP on the Meetup page. If you're wanting to hack on, or get help with boto,  shoot a tweet at me and let me know so I can be ready for you.

    17:09

    Amon - Python-powered server monitoring, logging, and error reporting with JSON API:

    Amon from Martin Rusev is a simple yet flexible way to add server monitoring, logging, and error tracking to your web stack. Amon consists of three parts: a collector daemon, a Python web app, and JSON API.

    • Collector daemon - Amon’s server and process monitoring is a thin wrapper on top of Unix tools to record metrics and store them in the MongoDB backend.
    • API - Shipping with language bindings for Python, Ruby, and JavaScript, Amon’s JSON API makes it easy to record your own application events.
    • Web interface - The web app provides a friendly user interface for viewing logs and visualizing data in charts.

    Amon

    The Amon documentation site is a great place to get started with installation and usage.

    My year-old macbook installation was showing its age. Or rather, there were some things wrong with it:

    • The original OS was 10.6, snow leopard. I upgraded it to lion (10.7) half a year ago. This was an in-place upgrade, not a fresh install. I wanted a fresh install to clean some stuff up and because it started to feel slow. I heard that a clean install would help a lot regarding speed.
    • I work a lot with geographic libraries, Django and geodjango. So originally I installed everything via the kyngchaos packages. Mapnik, gdal, spatialite and so on. But after the lion upgrade, I couldn't compile any python packages with C extensions anymore as gcc 4.0 (which everything had been build with) had been replaced by 4.2. And spatialite never would work right anyway. So I wanted to replace this.
    • I used homebrew as a package manager for the gnu/unix side of things instead of macports I'd been using before. It works, but I missed some things, like Quantum GIS (QGIS), which is included in macports. I hoped to get everything python+gis related done with one package manager, in my case macports.

    So I made sure my backups were OK, that my code was all committed, that my repositories were cleaned up, that all my dotfiles in my homedir were in version control and so on. Most of it was already OK, but of course there were some small things left. I'll do a write-up later on of my backup strategy and how I handle my dotfiles and so.

    Time for the actual lion reinstall. How does that work? I bought Lion from the app store, so it was downloaded and installed by my mac: I didn't have an install DVD. Turns out to be easy: just restart and press command-r during bootup and you'll get a "lion recovery" menu. Choose the reinstall option and it will download the latest full version and install it for you. Simple and works.

    The big surprise came when the computer rebooted. I expected a dialog to set up a main user. Instead, I got the regular login screen. Ok... Logging in... Hey! All my stuff is still there! All the settings, all my documents, all my music... No need to restore backups.

    So: an OSX lion restore wipes only the OS and reinstalls it. Including xcode, btw. The rest (your own data, applications, settings) is retained. Actually pretty handy.

    This did mean I had to clean up the kyngchaos packages and homebrew by hand. Just a matter of deleting some directories, telling homebrew to erase itself and adjusting my paths.

    12:54

    I know some people think that Pyramid is a complex framework and that aspiring Python web developers should start with something else and maybe come back to Pyramid later.

    But I truly believe that even if you’re a total beginner Pyramid can serve you really well and once you grow more experienced, you’ll appreciate the power that you get with the Pyramid framework.

    Take a look at this “hello world” application in hello.py:

    from wsgiref.simple_server import make_server
    from pyramid.config import Configurator
    
    def hello(request):
       return 'Hello %(name)s!' % request.matchdict
    
    if __name__ == '__main__':
        config = Configurator()
        config.add_route('main', '/hello/{name}')
        config.add_view(hello, route_name='main', renderer='string')
        server = make_server('', 8080, config.make_wsgi_app())
        server.serve_forever()
    

    It wasn’t that difficult.

    And this is how to install and run it:

    $ pip install pyramid
    $ python hello.py
    

    Then in your browser type http://localhost:8080/hello/fred and see the result. Easy peasy.

    So what are you waiting for? :) Grab it here and give it a try!

    12:36

    A simple example on how to use flask-mail with google or google apps email accounts. Other sources:

    -Flask-mail docs: http://packages.python.org/flask-mail/

    -Google support: http://support.google.com/mail/bin/answer.py?hl=en&answer=78799

    @kfk

    from flask import Flask
    from flaskext.mail import Mail, Message
    
    app =Flask(__name__)
    mail=Mail(app)
    
    app.config.update(
    	DEBUG=True,
    	#EMAIL SETTINGS
    	MAIL_SERVER='smtp.gmail.com',
    	MAIL_PORT=465,
    	MAIL_USE_SSL=True,
    	MAIL_USERNAME = 'you@google.com',
    	MAIL_PASSWORD = 'GooglePasswordHere'
    	)
    
    mail=Mail(app)
    
    @app.route("/")
    def index():
    	msg = Message(
                  'Hello',
    	       sender='you@dgoogle.com',
    	       recipients=
                   ['recipient@recipient_domain.com'])
    	msg.body = "This is the email body"
    	mail.send(msg)
    	return "Sent"
    
    if __name__ == "__main__":
        app.run()
    

    11:09

    The subprocess module provides some very useful functionality for working with external programs from Python applications, but is often complained about as being harder to use than it needs to be. See, for example, Kenneth Reitz’s Envoy project, which aims to provide an ease-of-use wrapper over subprocess. There’s also Andrew Moffat’s pbs project, which aims to let you do things like

    from pbs import ifconfig
    print ifconfig("eth0")

    Which it does by replacing sys.modules['pbs'] with a subclass of the module type which overrides __getattr__ to look for programs in the path. Which is nice, and I can see that it would be useful in some contexts, but I don’t find that wc(ls("/etc", "-1"), "-l") is more readable than call(“ls /etc –1 | wc –l”) in the general case.

    I’ve been experimenting with my own wrapper for subprocess, called sarge. The main things I need are:

    • I want to use command pipelines, but using subprocess out of the box often leads to deadlocks because pipe buffers get filled up.
    • I want to use bash-style pipe syntax on Windows as well as Posix, but Windows shells don’t support some of the syntax I want to use, like &&, ||, |& and so on.
    • I want to process output from commands in a flexible way, and communicate() is not always flexible enough for my needs - for example, if I need to process output a line at a time.
    • I want to avoid shell injection problems by having the ability to quote command arguments safely, and I want to minimise the use of shell=True, which I generally have to use when using pipelined commands.
    • I don’t want to set arbitrary limits on passing data between processes, such as Envoy’s 10MB limit.
    • subprocess allows you to let stderr be the same as stdout, but not the other way around - and I sometimes need to do that.

    I’ve been working on supporting these use cases, so sarge offers the following features:

    • A simple run function which allows a rich subset of Bash-style shell command syntax, but parsed and run by sarge so that you can run cross-platform on Posix and Windows without cygwin:

      >>> p = run('false && echo foo')
      >>> p.commands
      [Command('false')]
      >>> p.returncodes
      [1]
      >>> p.returncode
      1
      >>> p = run('false || echo foo')
      foo
      >>> p.commands
      [Command('false'), Command('echo foo')]
      >>> p.returncodes
      [1, 0]
      >>> p.returncode
      0
    • The ability to format shell commands with placeholders, such that variables are quoted to prevent shell injection attacks:

      >>> from sarge import shell_format
      >>> shell_format('ls {0}', '*.py')
      "ls '*.py'"
      >>> shell_format('cat {0}', 'a file name with spaces')
      "cat 'a file name with spaces'"
    • The ability to capture output streams without requiring you to program your own threads. You just use a Capture object and then you can read from it as and when you want:

      >>> from sarge import Capture, run
      >>> with Capture() as out:
      ... run('echo foobarbaz', stdout=out)
      ...
      <sarge.Pipeline object at 0x175ed10>
      >>> out.read(3)
      'foo'
      >>> out.read(3)
      'bar'
      >>> out.read(3)
      'baz'
      >>> out.read(3)
      '\n'
      >>> out.read(3)
      ''

      A Capture object can capture the output from multiple commands:

      >>> from sarge import run, Capture
      >>> p = run('echo foo; echo bar; echo baz', stdout=Capture())
      >>> p.stdout.readline()
      'foo\n'
      >>> p.stdout.readline()
      'bar\n'
      >>> p.stdout.readline()
      'baz\n'
      >>> p.stdout.readline()
      ''

      Delays in commands are honoured in asynchronous calls:

      >>> from sarge import run, Capture
      >>> cmd = 'echo foo & (sleep 2; echo bar) & (sleep 1; echo baz)'
      >>> p = run(cmd, stdout=Capture(), async=True) # returns immediately
      >>> p.close() # wait for completion
      >>> p.stdout.readline()
      'foo\n'
      >>> p.stdout.readline()
      'baz\n'
      >>> p.stdout.readline()
      'bar\n'
      >>>

      Here, the sleep commands ensure that the asynchronous echo calls occur in the order foo (no delay), baz (after a delay of one second) and bar (after a delay of two seconds); the capturing works as expected.

    Sarge hasn’t been released yet, but it’s not far off being ready. It’s meant for Python >= 2.6.5 and is tested on 2.6, 2.7, 3.1, 3.2 and 3.3 on Linux, Mac OS X, Windows XP and Windows 7 (not all versions are tested on all platforms, but the overall test coverage is comfortably over 90%).

    I have released the sarge documentation on Read The Docs; I’m hoping people will read this and give some feedback about the API and feature set being proposed, so that I can fill in any gaps where possible and perhaps make it more useful to other people. Please add your comments here, or via the issue tracker on the BitBucket project for the docs.

    Saturday, 04 February

    22:45

    In working on this project, I've hard to source my own resources for learning. As a software guy, my hardware knowledge is limited to whatever I can still remember from high school physics, which is 15 years ago now. Granted, I don't seem to need much more than high school physics here, but 15 years is a long time between study sessions.

    I discovered that a great many electronics books are structured as follows: fluffy introduction, definition of all terms, history of physics, all of physics, now do it. Unfortunately, I can't assimilate knowledge that way. It's impossible (for me) to integrate an abstract set of definitions and history lessons, and come out with a working knowledge of building circuits. It's hard just to get through the introduction without falling asleep, frankly. So here is where I go:

    Book: "Make: electronis: learning by discovery"
    http://books.google.com.au/books/about/Make_electronics.html?id=PQzYdC3BtQkC

    This video series, an intro to circuits from the absolute, total beginning; by Bucky Roberts:
    http://thenewboston.org/list.php?cat=41

    This video series, specifically covering the Arduino, by Jeremy Blum: http://www.youtube.com/watch?v=fCxzA9_kg6s

    I'm sure I'll end up assembling some more resources as I learn and need more advanced topics, but these constitute a really great start with a gentle learning curve.

    Cheers,
    -Tennessee

    19:45

    As many of us have seen in the media recently, the United States and other world governments are dee ...(more)...

    17:09

    I read an article by some guy called SM on the subject of jokers, he's saying the world is full of jokers - people who talk a lot but do little.

    I am a fuck-up at my current workplace - I handle sick leaves poorly, I show up for work five minutes late rather than five minutes early; I am a fuck-up at house chores - I rarely do the dishes, laundry is everywhere, cleaning is the last thing I think about; I sometimes fuck up with friends - I miss out on keeping in touch, I borrow money and forget about it, I hit on some poor guy's ex, the list goes on.

    I am not a fuck-up in my true nature, in fact I'm probably more of an over-zealous Asperger kid inside. I don't give up before it's too late, and I find a way when I need to. I move heaven and earth, as SM puts it.

    At first the logics seem counter-intuitive, but really it's an ages old problem: you have an infinite set of chores, and a limited rate of chore churning. How do you balance the workload; what do you do well, half-assed and not at all? More often than not, there is a conflict of interest between the various aspects of life. You have to call the shots.

    The todo list is the only way to avoid being a joker. You will have to defer tasks. That's just reality. You will sometimes defer tasks up to a point where you realize, "ah man wish I was going to do this but I'm not." That's not being a joker, that's just you being rational.

    So while I agree that it's a good thing to go into tunnel vision mode and just churn out a product in no time, it's also not a viable lifestyle. SM makes it seem as if the only way to live is 150% speed all the time and get rich.

    Call me complicated, but I want more out of life than that. If what it takes to make piles of money is complete tunnel vision, then I shall have none of it. Let me sit smug-faced in my middle-class bed and enjoy life before it flashes me by.

    Save on Delicious delicious.com

    12:54


    There is a small window between learning and knowing where it's possible to surprise yourself with a good design. Sometimes you are confronted with a new set of technologies and a new problem domain, and armed with a solid background in the fundamentals software engineering. You lay out the plans and begin work taking the best things from your previous experience, but only some of it applies. The rest is subject to your intuition and inherent skill, something that has nothing to do with experience with toolkits and successful or failed projects.

    You trudge on almost blindly, building the system to match the "requirements" as well as you understand them. Gradually the behavioral nuances of the new technology bubble to the service, and little by little your skill set rises to meet them.

    You proceed conservatively and test thoroughly, hoping for the best on the big day when you'll pull the switch. But your experience keeps tugging at your sleeve with the reminder that no matter how hard you design and test, the fact is that you simply can't be completely sure until it's all over.

    But, the system rolls out and you move on. You've done well. Clients are happy, and minor bugs roll in. Your hard work is rewarded with service requests that fall well within the realm of fixable flaws. You charge for them, and a wish list grows.

    Then some day down the road you finally have a chance to look back and get a feel for the process. "Wow," you think, it's kind of amazing how all of this works. Amazing how easy it is to get caught up in the confusion of the process while it's happening. But all in all, well done. Hard work always pays off.

    07:45

    • Go to a Python related conference in North America, South America, Europe, Asia, Africa, Australia, and New Zealand.
    • Attend at least one JavaScript related conference or event.
    • Upload all my outstanding pictures to Flickr!
    • Make Consumer Notebook profitable.
    • Find more ways to make Audrey Roy happy.
    • Pull off an Aú sem Mão during a Capoeira Roda.
    • Attend my first Capoeira Batizado.
    • See a place in the USA I've never been.
    • Work out at least three times a week.
    • Drop to a 32 waist
    • Visit friends and family back east. Been over a year since I've seen my sister!
    • Blog once a week. That is at least 52 blog entries!
    • Visit a Theme park.
    • Learn how to surf or snowboard.
    • Implement something in node.js, backbone.js, and handlebars.js
    • Take a high level Python class from the likes of Raymond Hettiger or David Beazly.
    • Teach some Python or Django.
    • Have a beer with Thomas, Andy, Andy, Tony, Garrick, Bernd, and the rest of Ye Aulde Gange.
    • See my old DC area friends such as Eric, Chris, Steve, Beth, Sarah, Daye, Renee, Kenneth, Leslie, Whitney, Dave, and many others.
    • Visit my Son.

    Friday, 03 February

    20:09

    credit: redshirtjosh, http://www.flickr.com/photos/43273828@N06/4111258568/

    On the behalf of Logilab I put a lot of efforts to include a new core feature named phases in Mercurial 2.1. Phases are a system for tracking which changesets have been or should be shared. This helps to prevent common mistakes when modifying history (for instance, with the mq or rebase extensions). It will transparently benefit to all users. This concept is the first step towards simple, safe and powerful rewritting mecanisms for history in mercurial.

    This serie of three blog entries will explain:

    1. how phases will help mercurial users,
    2. how one can control them,
    3. how older mercurial versions interact with newer versions that support phases.

    Preventing erroneous history rewriting

    credit: anita.priks, http://www.flickr.com/photos/46785534@N06/6358218623/

    History rewriting is a common practice in DVCS. However when done the wrong way the most common error results in duplicated history. The phase concept aims to make rewriting history safer. For this purpose Mercurial 2.1 introduces a distinction between the "past" part of your history (that is expected to stay there forever) and the "present" part of the history (that you are currently evolving). The old and immutable part is called public and the mutable part of your history is called draft.

    Let's see how this happens using a simple scenario.


    A new Mercurial user clones a repository:

    babar@Chessy ~ $ hg clone http://hg.celesteville.com/palace
    requesting all changes
    adding changesets
    adding manifests
    adding file changes
    added 2 changesets with 2 changes to 2 files
    updating to branch default
    2 files updated, 0 files merged, 0 files removed, 0 files unresolved
    babar@Chessy ~/palace $ cd palace
    babar@Chessy ~/palace $ hg log --graph
    @  changeset:   1:2afbcfd2af83
    |  tag:         tip
    |  user:        Celeste the Queen <Celeste@celesteville.com>
    |  date:        Wed Jan 25 16:41:56 2012 +0100
    |  summary:     We need a kitchen too.
    |
    o  changeset:   0:898889b143fb
       user:        Celeste the Queen <Celeste@celesteville.com>
       date:        Wed Jan 25 16:39:07 2012 +0100
       summary:     First description of the throne room
    

    The repository already contains some changesets. Our user makes some improvements and commits them:

    babar@Chessy ~/palace $ echo The wall shall be Blue >> throne-room
    babar@Chessy ~/palace $ hg ci -m 'Add wall color'
    babar@Chessy ~/palace $ echo In the middle stands a three meters round table >> kitchen
    babar@Chessy ~/palace $ hg ci -m 'Add a table in the kichen'
    

    But when he tries to push new changesets, he discovers that someone else already pushed one:

    babar@Chessy ~/palace $ hg push
    pushing to http://hg.celesteville.com/palace
    searching for changes
    abort: push creates new remote head bcd4d53319ec!
    (you should pull and merge or use push -f to force)
    babar@Chessy ~/palace $ hg pull
    pulling from http://hg.celesteville.com/palace
    searching for changes
    adding changesets
    adding manifests
    adding file changes
    added 1 changesets with 1 changes to 1 files (+1 heads)
    (run 'hg heads' to see heads, 'hg merge' to merge)
    babar@Chessy ~/palace $ hg log --graph
    o  changeset:   4:0a5b3d7e4e5f
    |  tag:         tip
    |  parent:      1:2afbcfd2af83
    |  user:        Celeste the Queen <Celeste@celesteville.com>
    |  date:        Wed Jan 25 16:58:23 2012 +0100
    |  summary:     Some bedroom description.
    |
    | @  changeset:   3:bcd4d53319ec
    | |  user:        Babar the King <babar@celesteville.com>
    | |  date:        Wed Jan 25 16:52:02 2012 +0100
    | |  summary:     Add a table in the kichen
    | |
    | o  changeset:   2:f9f14815935d
    |/   user:        Babar the King <babar@celesteville.com>
    |    date:        Wed Jan 25 16:51:51 2012 +0100
    |    summary:     Add wall color
    |
    o  changeset:   1:2afbcfd2af83
    |  user:        Celeste the Queen <Celeste@celesteville.com>
    |  date:        Wed Jan 25 16:41:56 2012 +0100
    |  summary:     We need a kitchen too.
    |
    o  changeset:   0:898889b143fb
       user:        Celeste the Queen <Celeste@celesteville.com>
       date:        Wed Jan 25 16:39:07 2012 +0100
       summary:     First description of the throne room
    

    Note

    From here on this scenario becomes very unlikely. Mercurial is simple enough for a new user not to be that confused by such a trivial situation. But we keep the example simple to focus on phases.

    Recently, our new user read some hype blog about "rebase" and the benefit of linear history. So, he decides to rewrite his history instead of merging.

    Despite reading the wonderful rebase help, our new user makes the wrong decision when it comes to using it. He decides to rebase the remote changeset 0a5b3d7e4e5f:"Some bedroom description." on top of his local changeset.

    With previous versions of mercurial, this mistake was allowed and would result in a duplication of the changeset 0a5b3d7e4e5f:"Some bedroom description."

    babar@Chessy ~/palace $ hg rebase -s 4 -d 3
    babar@Chessy ~/palace $ hg push
    pushing to http://hg.celesteville.com/palace
    searching for changes
    abort: push creates new remote head bcd4d53319ec!
    (you should pull and merge or use push -f to force)
    babar@Chessy ~/palace $ hg pull
    pulling from http://hg.celesteville.com/palace
    searching for changes
    adding changesets
    adding manifests
    adding file changes
    added 1 changesets with 1 changes to 1 files (+1 heads)
    (run 'hg heads' to see heads, 'hg merge' to merge)
    babar@Chessy ~/palace $ hg log --graph
    @  changeset:   5:55d9bae1e1cb
    |  tag:         tip
    |  parent:      3:bcd4d53319ec
    |  user:        Celeste the Queen <Celeste@celesteville.com>
    |  date:        Wed Jan 25 16:58:23 2012 +0100
    |  summary:     Some bedroom description.
    |
    | o  changeset:   4:0a5b3d7e4e5f
    | |  parent:      1:2afbcfd2af83
    | |  user:        Celeste the Queen <Celeste@celesteville.com>
    | |  date:        Wed Jan 25 16:58:23 2012 +0100
    | |  summary:     Some bedroom description.
    | |
    o |  changeset:   3:bcd4d53319ec
    | |  user:        Babar the King <babar@celesteville.com>
    | |  date:        Wed Jan 25 16:52:02 2012 +0100
    | |  summary:     Add a table in the kichen
    | |
    o |  changeset:   2:f9f14815935d
    |/   user:        Babar the King <babar@celesteville.com>
    |    date:        Wed Jan 25 16:51:51 2012 +0100
    |    summary:     Add wall color
    |
    o  changeset:   1:2afbcfd2af83
    |  user:        Celeste the Queen <Celeste@celesteville.com>
    |  date:        Wed Jan 25 16:41:56 2012 +0100
    |  summary:     We need a kitchen too.
    |
    o  changeset:   0:898889b143fb
       user:        Celeste the Queen <Celeste@celesteville.com>
       date:        Wed Jan 25 16:39:07 2012 +0100
       summary:     First description of the throne room
    

    In more complicated setups it's a fairly common mistake, Even in big and successful projects and with other DVCSs.

    In the new Mercurial version the user won't be able to make this mistake anymore. Trying to rebase the wrong way will result in:

    babar@Chessy ~/palace $ hg rebase -s 4 -d 3
    abort: can't rebase immutable changeset 0a5b3d7e4e5f
    (see hg help phases for details)
    

    The correct rebase still works as expected:

    babar@Chessy ~/palace $ hg rebase -s 2 -d 4
    babar@Chessy ~/palace $ hg log --graph
    @  changeset:   4:139ead8a540f
    |  tag:         tip
    |  user:        Babar the King <babar@celesteville.com>
    |  date:        Wed Jan 25 16:52:02 2012 +0100
    |  summary:     Add a table in the kichen
    |
    o  changeset:   3:0d1feb1bca54
    |  user:        Babar the King <babar@celesteville.com>
    |  date:        Wed Jan 25 16:51:51 2012 +0100
    |  summary:     Add wall color
    |
    o  changeset:   2:0a5b3d7e4e5f
    |  user:        Celeste the Queen <Celeste@celesteville.com>
    |  date:        Wed Jan 25 16:58:23 2012 +0100
    |  summary:     Some bedroom description.
    |
    o  changeset:   1:2afbcfd2af83
    |  user:        Celeste the Queen <Celeste@celesteville.com>
    |  date:        Wed Jan 25 16:41:56 2012 +0100
    |  summary:     We need a kitchen too.
    |
    o  changeset:   0:898889b143fb
       user:        Celeste the Queen <Celeste@celesteville.com>
       date:        Wed Jan 25 16:39:07 2012 +0100
       summary:     First description of the throne room
    

    What is happening here:

    • Changeset 0a5b3d7e4e5f from Celeste was set to the public phase because it was pulled from the outside. The public phase is immutable.
    • Changesets 0a5b3d7e4e5f and 0d1feb1bca54 have been commited locally and haven't been transmitted from this repository to another. As such, they are still in the draft phase. Unlike the public phase, the draft phase is mutable.

    Let's watch the whole action in slow motion, paying attention to phases:

    babar@Chessy ~ $ cat >> ~/.hgrc << EOF
    [ui]
    username=Babar the King <babar@celesteville.com>
    logtemplate='[{phase}] {desc} ({node|short})\\n'
    EOF
    

    First, changesets cloned from a public server are public:

    babar@Chessy ~ $ hg clone --quiet http://hg.celesteville.com/palace
    babar@Chessy ~/palace $ cd palace
    babar@Chessy ~/palace $ hg log --graph
    @  [public] We need a kitchen too. (2afbcfd2af83)
    |
    o  [public] First description of the throne room (898889b143fb)
    

    Second, new changesets committed locally are in the draft phase:

    babar@Chessy ~/palace $ echo The wall shall be Blue >> throne-room
    babar@Chessy ~/palace $ hg ci -m 'Add wall color'
    babar@Chessy ~/palace $ echo In the middle stand a three meters round table >> kitchen
    babar@Chessy ~/palace $ hg ci -m 'Add a table in the kichen'
    babar@Chessy ~/palace $ hg log --graph
    @  [draft] Add a table in the kichen (bcd4d53319ec)
    |
    o  [draft] Add wall color (f9f14815935d)
    |
    o  [public] We need a kitchen too. (2afbcfd2af83)
    |
    o  [public] First description of the throne room (898889b143fb)
    

    Third, changesets pulled from a public server are public:

    babar@Chessy ~/palace $ hg pull --quiet
    babar@Chessy ~/palace $ hg log --graph
    o  [public] Some bedroom description. (0a5b3d7e4e5f)
    |
    | @  [draft] Add a table in the kichen (bcd4d53319ec)
    | |
    | o  [draft] Add wall color (f9f14815935d)
    |/
    o  [public] We need a kitchen too. (2afbcfd2af83)
    |
    o  [public] First description of the throne room (898889b143fb)
    

    Note

    rebase preserves the phase of rebased changesets

    babar@Chessy ~/palace $ hg rebase -s 2 -d 4
    babar@Chessy ~/palace $ hg log --graph
    @  [draft] Add a table in the kichen (139ead8a540f)
    |
    o  [draft] Add wall color (0d1feb1bca54)
    |
    o  [public] Some bedroom description. (0a5b3d7e4e5f)
    |
    o  [public] We need a kitchen too. (2afbcfd2af83)
    |
    o  [public] First description of the throne room (898889b143fb)
    

    Finally, once pushed to the public server, changesets are set to the public (immutable) phase

    babar@Chessy ~/palace $ hg push
    pushing to http://hg.celesteville.com/palace
    searching for changes
    adding changesets
    adding manifests
    adding file changes
    added 2 changesets with 2 changes to 2 files
    babar@Chessy ~/palace $ hg log --graph
    
    @  [public] Add a table in the kichen (139ead8a540f)
    |
    o  [public] Add wall color (0d1feb1bca54)
    |
    o  [public] Some bedroom description. (0a5b3d7e4e5f)
    |
    o  [public] We need a kitchen too. (2afbcfd2af83)
    |
    o  [public] First description of the throne room (898889b143fb)
    

    To summarize:

    • Changesets exchanged with the outside are public and immutable.
    • Changesets committed locally are draft until exchanged with the outside.
    • As a user, you should not worry about phases. Phases move transparently.

    Preventing premature exchange of history

    credit: Richard Elzey, http://www.flickr.com/photos/elzey/3516256055/

    The public phases prevent user from accidentally rewriting public history. It's a good step forward but phases can go further. Phases can prevent you from accidentally making history public in the first place.

    For this purpose, a third phase is available, the secret phase. To explain it, I'll use the mq extension which is nicely integrated with this secret phase:

    Our fellow user enables the mq extension

    babar@Chessy ~/palace $ vim ~/.hgrc
    babar@Chessy ~/palace $ cat ~/.hgrc
    [ui]
    username=Babar the King <babar@celesteville.com>
    [extensions]
    # enable the mq extension included with Mercurial
    hgext.mq=
    [mq]
    # Enable secret phase integration.
    # This integration is off by default for backward compatibility.
    secret=true
    

    New patches (not general commits) are now created as secret

    babar@Chessy ~/palace $ echo A red carpet on the floor. >> throne-room
    babar@Chessy ~/palace $ hg qnew -m 'add a carpet' carpet.diff
    babar@Chessy ~/palace $ hg log --graph
    
    @  [secret] add a carpet (3c1b19d5d3f5)
    |
    @  [public] Add a table in the kichen (139ead8a540f)
    |
    o  [public] Add wall color (0d1feb1bca54)
    |
    
    

    this secret changeset is excluded from outgoing and push:

    babar@Chessy ~/palace $ hg outgoing
    comparing with http://hg.celesteville.com/palace
    searching for changes
    no changes found (ignored 1 secret changesets)
    babar@Chessy ~/palace $ hg push
    pushing to http://hg.celesteville.com/palace
    searching for changes
    no changes found (ignored 1 secret changesets)
    

    And other users do not see it:

    celeste@Chessy ~/palace $ hg incoming ~babar/palace/
    comparing with ~babar/palace
    searching for changes
    [public] Add wall color (0d1feb1bca54)
    [public] Add a table in the kichen (139ead8a540f)
    

    The mq integration take care of phase movement for the user. Changeset are made draft by qfinish

    babar@Chessy ~/palace $ hg qfinish .
    babar@Chessy ~/palace $ hg log --graph
    @  [draft] add a carpet (2afbcfd2af83)
    |
    o  [public] Add a table in the kichen (139ead8a540f)
    |
    o  [public] Add wall color (0d1feb1bca54)
    |
    
    

    And changesets are made secret again by qimport

    babar@Chessy ~/palace $ hg qimport -r 2afbcfd2af83
    babar@Chessy ~/palace $ hg log --graph
    @  [secret] add a carpet (2afbcfd2af83)
    |
    o  [public] Add a table in the kichen (139ead8a540f)
    |
    o  [public] Add wall color (0d1feb1bca54)
    |
    
    

    As expected, mq refuses to qimport public changesets

    babar@Chessy ~/palace $ hg qimport -r 139ead8a540f
    abort: revision 4 is not mutable
    

    In the next part I'll details how to control phases movement.

    This is the second part of a series of posts about the new phases feature we implemented for mercurial 2.1. The first part talks about how phases will help mercurial users, this second part explains how to control them.

    Controlling automatic phase movement

    Sometimes it may be desirable to push and pull changesets in the draft phase to share unfinished work. Below are some cases:

    • pushing to continuous integration,
    • pushing changesets for review,
    • user has multiple machines,
    • branch clone.

    You can disable publishing behavior in a repository configuration file [1]:

    [phases]
       publish=False
       

    When a repository is set to non-publishing, people push changesets without altering their phase. draft changesets are pushed as draft and public changesets are pushed as public:

    celeste@Chessy ~/palace $ hg showconfig phases
       phases.publish=False
       
    babar@Chessy ~/palace $ hg log --graph
       @  [draft] add a carpet (2afbcfd2af83)
       |
       o  [public] Add a table in the kichen (139ead8a540f)
       |
       o  [public] Add wall color (0d1feb1bca54)
       |
       
       babar@Chessy ~/palace $ hg outgoing ~celeste/palace/
       [public] Add wall color (0d1feb1bca54)
       [public] Add a table in the kichen (139ead8a540f)
       [draft] add a carpet (3c1b19d5d3f5)
       babar@Chessy ~/palace $ hg push ~celeste/palace/
       pushing to ~celeste/palace/
       searching for changes
       adding changesets
       adding manifests
       adding file changes
       added 3 changesets with 3 changes to 2 files
       babar@Chessy ~/palace $ hg log --graph
       @  [draft] add a carpet (2afbcfd2af83)
       |
       o  [public] Add a table in the kichen (139ead8a540f)
       |
       o  [public] Add wall color (0d1feb1bca54)
       |
       
       
    celeste@Chessy ~/palace $ hg log --graph
       o  [draft] add a carpet (2afbcfd2af83)
       |
       o  [public] Add a table in the kichen (139ead8a540f)
       |
       o  [public] Add wall color (0d1feb1bca54)
       |
       
       

    And pulling gives the phase as in the remote repository:

    celeste@Chessy ~/palace $ hg up 139ead8a540f
       celeste@Chessy ~/palace $ echo The wall will be decorated with portraits >> bedroom
       celeste@Chessy ~/palace $ hg ci -m 'Decorate the wall.'
       created new head
       celeste@Chessy ~/palace $ hg log --graph
       @  [draft] Decorate the wall. (3389164e92a1)
       |
       | o  [draft] add a carpet (3c1b19d5d3f5)
       |/
       o  [public] Add a table in the kichen (139ead8a540f)
       |
       o  [public] Add wall color (0d1feb1bca54)
       |
       
       ---
       babar@Chessy ~/palace $ hg pull ~celeste/palace/
       pulling from ~celeste/palace/
       searching for changes
       adding changesets
       adding manifests
       adding file changes
       added 1 changesets with 1 changes to 1 files (+1 heads)
       babar@Chessy ~/palace $ hg log --graph
       @  [draft] Decorate the wall. (3389164e92a1)
       |
       | o  [draft] add a carpet (3c1b19d5d3f5)
       |/
       o  [public] Add a table in the kichen (139ead8a540f)
       |
       o  [public] Add wall color (0d1feb1bca54)
       |
       
       

    Phase information is exchanged during pull and push operations. When a changeset exists on both sides but within different phases, its phase is unified to the lowest [2] phase. For instance, if a changeset is draft locally but public remotely, it is set public:

    celeste@Chessy ~/palace $ hg push -r 3389164e92a1
       pushing to http://hg.celesteville.com/palace
       searching for changes
       adding changesets
       adding manifests
       adding file changes
       added 1 changesets with 1 changes to 1 files
       celeste@Chessy ~/palace $ hg log --graph
       @  [public] Decorate the wall. (3389164e92a1)
       |
       | o  [draft] add a carpet (3c1b19d5d3f5)
       |/
       o  [public] Add a table in the kichen (139ead8a540f)
       |
       o  [public] Add wall color (0d1feb1bca54)
       |
       
       ---
       babar@Chessy ~/palace $ hg pull ~celeste/palace/
       pulling from ~celeste/palace/
       searching for changes
       no changes found
       babar@Chessy ~/palace $ hg log --graph
       @  [public] Decorate the wall. (3389164e92a1)
       |
       | o  [draft] add a carpet (3c1b19d5d3f5)
       |/
       o  [public] Add a table in the kichen (139ead8a540f)
       |
       o  [public] Add wall color (0d1feb1bca54)
       |
       
       

    Note

    pull is read-only operation and does not alter phases in remote repositories.

    You can also control the phase in which a new changeset is committed. If you don't want new changesets to be pushed without explicit consent, update your configuration with:

    [phases]
       new-commit=secret
       

    You will need to use manual phase movement before you can push them. See the next section for details:

    Note

    With what have been done so far for 2.1, the "most practical way to make a new commit secret" is to use:

       hg commit --config phases.new-commit=secret
       
    [1]You can use this setting in your user hgrc too.
    [2]Phases as ordered as follow: public < draft < secret

    Manual phase movement

    Most phase movements should be automatic and transparent. However it is still possible to move phase manually using the hg phase command:

    babar@Chessy ~/palace $ hg log --graph
       @    [draft] merge with Celeste works (f728ef4eba9f)
       |\
       o |  [draft] add a carpet (3c1b19d5d3f5)
       | |
       | o  [public] Decorate the wall. (3389164e92a1)
       |/
       o  [public] Add a table in the kichen (139ead8a540f)
       |
       
       babar@Chessy ~/palace $ hg phase --public 3c1b19d5d3f5
       babar@Chessy ~/palace $ hg log --graph
       @    [draft] merge with Celeste works (f728ef4eba9f)
       |\
       o |  [public] add a carpet (3c1b19d5d3f5)
       | |
       | o  [public] Decorate the wall. (3389164e92a1)
       |/
       o  [public] Add a table in the kichen (139ead8a540f)
       |
       
       

    Changesets only move to lower [#] phases during normal operation. By default, the phase command enforces this rule:

    babar@Chessy ~/palace $ hg phase --draft 3c1b19d5d3f5
       no phases changed
       babar@Chessy ~/palace $ hg log --graph
       @    [draft] merge with Celeste works (f728ef4eba9f)
       |\
       o |  [public] add a carpet (3c1b19d5d3f5)
       | |
       | o  [public] Decorate the wall. (3389164e92a1)
       |/
       o  [public] Add a table in the kichen (139ead8a540f)
       |
       
       

    If you are confident in what your are doing you can still use the --force switch to override this behavior:

    Warning

    Phases are designed to avoid forcing people to use hg phase --force. If you need to use --force on a regular basis, you are probably doing something wrong. Read the previous section again to see how to configure your environment for automatic phase movement suitable to your needs.

    babar@Chessy ~/palace $ hg phase --verbose --force --draft 3c1b19d5d3f5
       phase change for 1 changesets
       babar@Chessy ~/palace $ hg log --graph
       @    [draft] merge with Celeste works (f728ef4eba9f)
       |\
       o |  [draft] add a carpet (3c1b19d5d3f5)
       | |
       | o  [public] Decorate the wall. (3389164e92a1)
       |/
       o  [public] Add a table in the kichen (139ead8a540f)
       |
       
       

    Note that a phase defines a consistent set of revisions in your history graph. This means that to have a public (immutable) changeset all its ancestors need to be immutable too. Once you have a secret (not exchanged) changeset, all its descendant will be secret too.

    This means that changing the phase of a changeset may result in phase movement for other changesets:

    babar@Chessy ~/palace $ hg phase -v --public f728ef4eba9f # merge with Celeste works
       phase change for 2 changesets
       babar@Chessy ~/palace $ hg log --graph
       @    [public] merge with Celeste works (f728ef4eba9f)
       |\
       o |  [public] add a carpet (3c1b19d5d3f5)
       | |
       | o  [public] Decorate the wall. (3389164e92a1)
       |/
       o  [public] Add a table in the kichen (139ead8a540f)
       |
       
       babar@Chessy ~/palace $ hg phase -vf --draft 3c1b19d5d3f5 # add a carpet
       phase change for 2 changesets
       babar@Chessy ~/palace $ hg log --graph
       @    [draft] merge with Celeste works (f728ef4eba9f)
       |\
       o |  [draft] add a carpet (3c1b19d5d3f5)
       | |
       | o  [public] Decorate the wall. (3389164e92a1)
       |/
       o  [public] Add a table in the kichen (139ead8a540f)
       |
       
       

    The next and final post will explain how older mercurial versions interact with newer versions that support phases.

    [Images by Jimmy Smith (cc-by-nd) and Cory Doctorow (cc-by-sa)]

    This is the final part of a series of posts about the new phases feature we implemented for mercurial 2.1. The first part talks about how phases will help mercurial users, the second part explains how to control them. This one explains what people should take care of when upgrading.

    Important upgrade note and backward compatibility

    Phases do not require any conversion of your repos. Phase information is not stored in changesets. Everybody using a new client will take advantage of phases on any repository they touch.

    However there is some points you need to be aware of regarding interaction between the old world without phases and the new world with phases:

    Talking over the wire to a phaseless server using a phased client

    As ever, the Mercurial wire protocol (used to communicate through http and ssh) is fully backward compatible [1]. But as old Mercurial versions are not aware of phases, old servers will always be treated as publishing.

    Direct file system access to a phaseless repository using a phased client

    A new client has no way to determine which parts of the history should be immutable and which parts should not. In order to fail safely, a new repo will mark everything as public when no data is available. For example, in the scenario described in part I, if an old version of mercurial were used to clone and commit, a new version of mercurial will see them as public and refuse to rebase them.

    Note

    Some extensions (like mq) may provide smarter logic to set some changesets to the draft or even secret phases.

    The phased client will write phase data to the old repo on its first write operation.

    Direct file system access to a phased repository using a phaseless client

    Everything works fine except that the old client is unable to see or manipulate phases:

    • Changesets added to the repo inherit the phase of their parents, whatever the parents' phase. This could result in new commits being seen as public or pulled content seen as draft or even secret when a newer client uses the repo again!
    • Changesets pushed to a publishing server won't be set public.
    • Secret changesets are exchanged.
    • Old clients are willing to rewrite immutable changesets (as they don't know that they shouldn't).

    So, if you actively rewrite your history or use secret changesets, you should ensure that only new clients touch those repositories where the phase matters.

    Fixing phases error

    Several situations can result in bad phases in a repository:

    • When upgrading from phaseless to phased Mercurial, the default phases picked may be too restrictive.
    • When you let an old client touch your repository.
    • When you push to a publishing server that should not actually be publishing.

    The easiest way to restore a consistant state is to use the phase command. In most cases, changesets marked as public but absent from your real public server should be moved to draft:

    hg phase --force --draft 'public() and outgoing()'
    

    If you have multiple public servers, you can pull from the others to retrieve their phase data too.

    Conclusion

    Mercurial's phases are a simple concept that adds always on and transparent safety for most users while not preventing advanced ones from doing whatever they want.

    Behind this safety-enabling and useful feature, phases introduce in Mercurial code the concept of sharing mutable parts of history. The introduction of this feature paves the way for advanced history rewriting solutions while allowing safe and easy sharing of mutable parts of history. I'll post about those future features shortly.


    [1]You can expect the 0.9.0 version of Mercurial to interoperate cleanly with one released 5 years later.

    [Images by Crystian Cruz (cc-nd) and C.J. Peters (cc-by-sa)]

    17:09


    I'm hoping to have a few weeks to work on my own projects before I dive into working on other people's projects again (that might not pan out, but I'm hoping), so, here's a brain-dump of what I'm considering playing with:

    • write a simple, generic shader-based legacy-free scenegraph engine (basically transplant the modern parts from OpenGLContext and leave behind the old crud, then translate the core into C/C++)
    • turn Sillescope into an Android app (that shouldn't take too long, I just got annoyed at the GLES limitations last time and stopped 1/2 way)
    • learn Haskell (though the "Haskell for Python Programmers" article honestly left me thinking "this is dumb" much of the time)
    • contribute to a game engine (maybe Ogre, maybe 0AD)
    • add a GLES binding to PyOpenGL
    • play with PyPy now that I have a machine that compile it
    • build a basic HTML5 Canvas or WebGL game engine
    • update and modernize StarPy (I think no, as I have spent the last 18 months on VoIP and Django)
    • update Django-jqm with latest JQuery Mobile, provide a JQM admin interface (again, not likely, just spent the last 18 months in Django)

    Any other suggestions? I'm not currently concerned about utility or practicality, just fun things with which to spend a few weeks to recharge my programming-enjoyment batteries.

    16:54

    Earlier today, Apple announced v 1.1 of the Security update 2012-001 ...(more)...

    14:36

    Last week Sophos released it 2012 Security Threat Report which highlighted some key finding from 201 ...(more)...

    12:54

    As noted in the previous post - we had a minor blip regarding the PyCon 2012 hotel - by minor blip, I mean we completely booked the Hyatt (our main hotel).

    This issue has been resolved without needing me to resort to going to Home Depot and buying "The Dummies Guide to Hotel Building".

    We now have plenty of rooms at:

    Update 2/3/12:

    Hilton Santa Clara - NOW FULL within walking distance of the venue. These rooms are marked as 159$/night - however, as we want to do whats right for attendees, we have asked the Hilton to credit each room night booked under out block 10$, while the PyCon master account will absorb the 10$ additional cost. This means that the base room rate for attendees will be 149$/night, matching the cost for the Hyatt.

    The Avatar Hotel, Santa Clara - STILL AVAILABLE this one is not within walking distance, however the cost per night is 149$/night - matching our other rates, and we have negotiated a free shuttle for attendees to and from the Santa Clara Convention Center.

    The Marriott Santa Clara - STILL AVAILABLE again, maintaining the room night cost, and while it too is not within walking distance, we will have a free shuttle to and from the convention center!

    I must note: All of these agreements include room minimums - this means that PyCon will get charged a lot of money if we do not book the blocks we have contracted for.

    In order for us to get credit for your hotel bookings, you must book through our registration and housing system at: https://us.pycon.org/2012/registration/register/ - or by contacting our housing bureau at pycon5-reg@cteusa.com or by phone at 847-759-4277.

    Please book your rooms through us, and please book as soon as you can! You'll not only get a room for the conference - you'll help us out, and be a part of what is already the biggest Pycon on record. If you haven't registered? You need to - registration is capped at 1500 attendees, and by all estimates, we are going to hit that number and soon. (All financial aid recipients are accounted for in attendance and hotel, by the way.)

    Jesse Noller, PyCon Chair.

    It looks like this year is nearing its end. Thanks for tagging along! I thought it might be fun to write a post that highlights some of the nicer posts I wrote this year. So far I've been blogging around two and a half years.

    I think blogging is slowly starting show its advantages. Just a while ago I needed to solve certain Django specific problem. After googling around I happened to find the solution at my blog. In essence this blog serves as a kind of auxiliary memory of mine. As a side benefit some other people might find my ramblings useful too. This in turn might lead to new chances. Blogging is definitely a good way to market yourself if you're into that sort of thing.

    There has been some talk on whether or not blogging is dying. The basic premise is that social mediums such as Facebook and Twitter are eating its popularity. That's probably partially true. I believe blogs will remain to have some influence. After all you'll need something to discuss and tweet about. Most importantly blogs are more permanent by nature. It's easier to refer back to some concrete blog post than some obscure Twitter conversation ages ago. Different mediums serve different purposes.

    Now that I got the intro bit out of the way, let's take a look at the year. Quite a few things happened. While at it I'll try to outline some possible ideas for the next one. It's not like I'm running out of ideas. On the contrary. There's still plenty of material left I need to get out there sooner or later.

    Read more »

    Surprised omg

    It's official - for the first time in the 10 year history of PyCon, we have had to close main conference registrations. This is amazing.

    Registration

    As a reminder: You can still register on the site for your hotel needs (more on this in a moment) or tutorial only registrations. Sponsors registrations are accounted for and slots are held, as are all financial aid recipients and those with valid promotional codes. If you have issues with registering, please contact pycon5-reg@cteusa.com / 847-759-4277 or email pycon-reg@python.org.

    Hotel

    We still have hotel space! Currently, the Hyatt and the Hilton are sold out, but we still have rooms at the Avatar and Marriott - as the latter two are not within walking distance to the venue, we will be providing free shuttle service to and from the conference.

    In order for us to get credit for your hotel bookings, you must book through our registration and housing system at: https://us.pycon.org/2012/registration/register/ - or by contacting our housing bureau at pycon5-reg@cteusa.com or by phone at 847-759-4277.

    In Closing

    PyCon 2012 will be the biggest PyCon yet by all measurements and dimensions. Amazing talks, tutorials, posters - robots - we are going to have it all for you. We're working on fleshing out the open spaces, the sprints and the PyCon 5k (please sign up!). The volunteer team is working on welcoming committees, social events and many other things.

    As we all know, PyCon is much more than just a conference - it's community. It is the Python community. And given how much we've grown, it's imperative that we go out of our way to do everything in our power to maintain that community feel. We need to maintain the welcomeness, the closeness and the feeling of belonging. If you are interested in volunteering - please sign up to pycon-organizers and watch out for everything coming down the pike.

    If you have any ideas on how we can make things better? Shoot an email to jnoller@python.org - favorite things from the past that happened, things you loved and would like to see again, it's all fair game.

    Thank You. Without you, none of this would matter, or would happen.

    About a year and a half ago my dream job of doing nothing but Jython all day and night came to an end. To re-integrate myself into society I had to go cold turkey on Jython for a while so I could learn how to have a regular job again. I've contributed to Jython here and there by coding some of this and that, but I've failed to take care of the most important part: helping new people that want to get involved in Jython. I've let that go on for too long and I need to turn things around and get back to doing that. Recently a frustrated patch author sent an email about how hard it is to become a Jython contributor. He has some patches that have been sitting around for a long time and I'm pretty ashamed that that is the normal course of things lately. So, as I start giving Jython a bit more of my spare time again, I plan to make it a priority to review patches and try to figure out how to grow the Jython developer community again. So send patches and I promise to look at them. In particular, if anyone wants to put together patches that fix failing tests in the default Jython branch that targets 2.6 compatibility, I'll be right on them. I'll put together another post on contributing to Jython soon.

    11:54

    Sketching Interfaces Workshop
    Jason & Sam are over at the Dublin Interaction12 conference this year and have been running a Sketching Interfaces Workshop for which they released some slides. The slides are loaded with a few handy tips and rants (i.e. of what sketches are not). They also look at sketching as a number of activities pertaining to: warming up, capturing, focusing, writing, styling, and adding interaction. I’m sure looking through slides probably isn’t as awesome as meeting these two in person at an event. At least it’s something for those who weren’t able to make it to Ireland. :) Cheers and thanks for sharing!

    Credits: Jason Mesut & Sam Smith

    11:09

    We just released LFS 0.6.5. This is a yet another bugfix release. 

    Changes

    • Bugfix: added csrftoken for rating mails (Maciej Wi?niowski)
    • Bugfix: fixed ImageWithThumbsField (Maciej Wi?niowski)
    • Updated romanian translations (olimpiu)
    • Updated polish translations (Maciej Wi?niowski)

    News:

    Information

    You can find more information and help on following locations:

     

    08:09

    Last week I received an email from a user of pycparser that mentioned the strange AST that results when pycparser parses a switch statement.

    Let’s take the following snippet of C code for example. Don’t look for semantic sense in it – it’s just used to test the parser:

    switch (myvar) {
        case 10:
            k = 10;
            p = k + 1;
            return 10;
        case 20:
        case 30:
            return 20;
        default:
            break;
    }
    

    And the AST pycparser was generating for this code:

    Switch:
      ID: myvar
      Compound:
        Case:
          Constant: int, 10
          Assignment: =
            ID: k
            Constant: int, 10
        Assignment: =
          ID: p
          BinaryOp: +
            ID: k
            Constant: int, 1
        Return:
          Constant: int, 10
        Case:
          Constant: int, 20
          Case:
            Constant: int, 30
            Return:
              Constant: int, 20
        Default:
          Break:
    

    There are two problems here:

    1. Only the first statement inside each case is made a child of that case – the other statements are siblings.
    2. Two consecutive case statements without any other statements in between (fall-through) cause the second case to become the child of the first one. If additional consecutive case statements follow, they nest even further.

    Since the parser follows the C grammar pretty closely, I immediately went to look into the C99 standard, and indeed, this is exactly the parse tree that it mandates. Here’s the relevant portion of the language grammar (from section A.2.3):

    (6.8) statement:
                  labeled-statement
                  compound-statement
                  expression-statement
                  selection-statement
                  iteration-statement
                  jump-statement
    (6.8.1) labeled-statement:
                  identifier : statement
                  case constant-expression : statement
                  default : statement
    

    Note that a case (and default, which is equivalent to case in this whole discussion) must be followed by one, and only one other statement. This explains why pycparser parses the code above the way it does.

    However, the goal of pycparser is not to generate a parse tree. It is to generate an abstract syntax tree (AST), which follows the language semantics rather than its grammar. Hey, I already wrote about this stuff!

    So today I fixed this part of pycparser, by adding a dedicated AST transformation after parsing a switch statement. The transformation isn’t really complicated, and the AST pycparser generates now is much friendlier. Here it is, for the same code:

    Switch:
      ID: myvar
      Compound:
        Case:
          Constant: int, 10
          Assignment: =
            ID: k
            Constant: int, 10
          Assignment: =
            ID: p
            BinaryOp: +
              ID: k
              Constant: int, 1
          Return:
            Constant: int, 10
        Case:
          Constant: int, 20
        Case:
          Constant: int, 30
          Return:
            Constant: int, 20
        Default:
          Break:
    

    As you can see, the problems mentioned above were fixed. This fix is available in the pycparser Mercurial repository and will be part of the next release.

    Related posts:

    1. Parsing C++ in Python with Clang People that need to parse and analyze C code in...
    2. pycparser now supports C99 Today I released pycparser version 2.00, with support for C99...
    3. On parsing the C standard library headers Introduction I’m now in the process of writing a...

    times - Handle time and timezones simply in Python:

    Update: Check out Vincent’s blog post for background.

    Vincent Driessen, author of gitflow, has released Times, a time and timezones library for Python. Times builds on pytz and aims to simplify time and timezone handling. When handling time with Times, convert time to universal time first:

    >>> times.to_universal(local_time, 'Europe/Amsterdam')
    datetime.datetime(2012, 2, 1, 10, 31, 45, 781262)
    

    When displaying time for a user, format it with their local time zone:

    >>> print times.format(now, 'CET')
    2012-02-01 21:32:10+0100
    

    See the README for advanced usage. Oh, and why aren’t you using git flow?

    07:27

    Well, the dream of posting a weekly update is not working out.  Things are very busy here and it is a challenge to find the time to write this post.  I am committed to writing an update periodically so I'm going to modify my plan to make this bi-weekly.

    CCAP

    • The Physics rendering in Prince XML is complete except for some math issues.  The code is on the development server and should move to production soon with Sociology.
    • Our current focus is on getting the Sociology rendering tested and released.  The current release date is Feb. 8th or 9th.
    • There are several Math issues in the Physics book.  Some of the problems were caused by the Word importer, some by the original structure of the math in the Word documents and others are font issues.  When the Physics book is migrated to production, we will run a script to clean up some of the import problems to minimize the need for human intervention.  Some of the font issues will not be able to be resolved because the Stix fonts do not have the needed font.  An example is an italicized delta.
    CNX Conference Preparation
    • We have over 50 people signed up for the Sprints!  We are very excited about this.  As best we can tell, about 30 of these will be developers/coders/designers.  With this many people involved, we are going to make a huge effort to have clear easy install instructions for the various options.  Next week, we will be testing everything and finalizing the instructions.
    • Once the instructions are complete, we will post the link on the Rhaptos list so you can install Rhaptos or anything else you might need prior to arriving at the conference.
    • Many thanks to all that are planning to Sprint with us!
    OERPub API
    • We released the latest version of the OERPub API last week.  It will be used during the Conference Sprints by the sprinters working on content.
    HTML Editor Discussion
    • We have had a little time to continue our discussion in house regarding a new editor for Connexions.
    • Most of the discussion has been Microdata vs Microformats.  Neither has much traction in the wild.  
    • CSS3 seems to have support for Microdata which would allow us to use it to decorate semantic elements.  Microformats also have CSS support.
    • There has also been discussion on which editor to select.  TinyMCE and Aloha have been our focus.
    • TinyMCE has a nice UI, but does not support all of HTML5.
    • Aloha supports all of HTML5, but has an odd UI.  The UI is not bad, just takes a little getting use to.  The demos on Aloha's site are very specific so we need to install it and play with the configurations of the UI.
    • I'm sure the editor will be discussed at the conference so let us know your thoughts if you are attending. 

    06:18

    As from now, Four Digits has started the initial preparations on the upcomming Ploneconf in october. We have the venue and we have formed a team to make it happen. If you have suggestions, ideas or questions, don't hesitate to contact us.

    While writing the bid for the conference, we also formed a task force that will handle the organisation. It is a lot of work so we need all the help we can get. Down the road we anticipate on more people who can assist, but Four Digits will handle the main things to do.  Here is how it is done.

    4dteam1.png

     

    From left to right

    Maarten Kling: Attendee manager

    Maarten was responsible for writing the bid and during the conference, he is in charge of registration, the budget, information desk and welcome. Questions about where to stay, your fee or anything related to money and sponsoring, contact him.

    Ralph Jacobs: Social Activity manager

    Ralph is the man you need to contact for information about Arnhem, what to see and what to do. He makes sure you will visit the best bars, the best restaurants and the thing you must see when visiting the city. It is his responsibility to come up with a great beter half program.

    Sjoerd van Elferen: Venue & Sprint manager

    Sjoerd will handle everything that happens at the venue. He is in contact with the people there and he is in charge of things and stuff we need there: beamers, tables, chairs and things like coffee and lunch. Sjoerd also coordinates the lightning talks, sprints and stand ups.

    Rob Gietema: Program manager

    Rob is assigned to scheduling and programming. He will make a day to day program that contains all the talks, sprints and open spaces. At the venue, we have multiple rooms and a great auditorium; Rob has the overview on what is happening where and when. Apart from that, Rob takes care of the key note speaker and other VIP's who will speak at the conference.

    Yadi Dragtsma: Marketing manager

    Our marketing and PR guy. He does the contact with the press, handles the website and social platforms like Twitter, this blog and other meda. If all goes well, everybody in Arnhem is going to know that Plone is in town. During the conference, Yadi will do reports, stories and updates for the people back home.

    Martijn Jacobs: Party manager

    Martijn is not in the photo, but nevertheless he plays a vital part in the organisation. Martijn is developer, DJ and part of a successful dance music act. He is the party manager and takes care of the conference fest. The music, the equipment, the acts...Martijn is on top of things.

    For information or questions regarding the conference:

    info@ploneconf.org