Wednesday, 03 September


Video: How to Design Components for Mobile First

 ∗ LukeW | Digital Product Design + Strategy

As more online time shifts to mobile devices and networks. Designing experiences for large screen monitors and high-speed networks limits the amount of people your service can reach. Designing mobile first lets you reach that audience without compromising your product. In this short 4 minute video, I look at how by focusing in on a few common components.

This video is part of my User Experience How-To series sponsored by the Intel Developer Zone.

Git: The Safety Net for Your Projects

 ∗ A List Apart: The Full Feed

I remember January 10, 2010, rather well: it was the day we lost a project’s complete history. We were using Subversion as our version control system, which kept the project’s history in a central repository on a server. And we were backing up this server on a regular basis—at least, we thought we were. The server broke down, and then the backup failed. Our project wasn’t completely lost, but all the historic versions were gone.

Shortly after the server broke down, we switched to Git. I had always seen version control as torturous; it was too complex and not useful enough for me to see its value, though I used it as a matter of duty. But once we’d spent some time on the new system, and I began to understand just how helpful Git could be. Since then, it has saved my neck in many situations.

During the course of this article, I’ll walk through how Git can help you avoid mistakes—and how to recover if they’ve already happened.

Every teammate is a backup

Since Git is a distributed version control system, every member of our team that has a project cloned (or “checked out,” if you’re coming from Subversion) automatically has a backup on his or her disk. This backup contains the latest version of the project, as well as its complete history.

This means that should a developer’s local machine or even our central server ever break down again (and the backup not work for any reason), we’re up and running again in minutes: any local repository from a teammate’s disk is all we need to get a fully functional replacement.

Branches keep separate things separate

When my more technical colleagues told me about how “cool” branching in Git was, I wasn’t bursting with joy right away. First, I have to admit that I didn’t really understand the advantages of branching. And second, coming from Subversion, I vividly remembered it being a complex and error-prone procedure. With some bad memories, I was anxious about working with branches and therefore tried to avoid it whenever I could.

It took me quite a while to understand that branching and merging work completely differently in Git than in most other systems—especially regarding its ease of use! So if you learned the concept of branches from another version control system (like Subversion), I recommend you forget your prior knowledge and start fresh. Let’s start by understanding why branches are so important in the first place.

Why branches are essential

Back in the days when I didn’t use branches, working on a new feature was a mess. Essentially, I had the choice between two equally bad workflows:

(a) I already knew that creating small, granular commits with only a few changes was a good version control habit. However, if I did this while developing a new feature, every commit would mingle my half-done feature with the main code base until I was done. It wasn’t very pleasant for my teammates to have my unfinished feature introduce bugs into the project.

(b) To avoid getting my work-in-progress mixed up with other topics (from colleagues or myself), I’d work on a feature in my separate space. I would create a copy of the project folder that I could work with quietly—and only commit my feature once it was complete. But committing my changes only at the end produced a single, giant, bloated commit that contained all the changes. Neither my teammates nor I could understand what exactly had happened in this commit when looking at it later.

I slowly understood that I had to make myself familiar with branches if I wanted to improve my coding.

Working in contexts

Any project has multiple contexts where work happens; each feature, bug fix, experiment, or alternative of your product is actually a context of its own. It can be seen as its own “topic,” clearly separated from other topics.

If you don’t separate these topics from each other with branching, you will inevitably increase the risk of problems. Mixing different topics in the same context:

  • makes it hard to keep an overview—and with a lot of topics, it becomes almost impossible;
  • makes it hard to undo something that proved to contain a bug, because it’s already mingled with so much other stuff;
  • doesn’t encourage people to experiment and try things out, because they’ll have a hard time getting experimental code out of the repository once it’s mixed with stable code.

Using branches gave me the confidence that I couldn’t mess up. In case things went wrong, I could always go back, undo, start fresh, or switch contexts.

Branching basics

Branching in Git actually only involves a handful of commands. Let’s look at a basic workflow to get you started.

To create a new branch based on your current state, all you have to do is pick a name and execute a single command on your command line. We’ll assume we want to start working on a new version of our contact form, and therefore create a new branch called “contact-form”:

$ git branch contact-form

Using the git branch command without a name specified will list all of the branches we currently have (and the “-v” flag provides us with a little more data than usual):

$ git branch -v
Git screen showing the current branches of contact-form.

You might notice the little asterisk on the branch named “master.” This means it’s the currently active branch. So, before we start working on our contact form, we need to make this our active context:

$ git checkout contact-form

Git has now made this branch our current working context. (In Git lingo, this is called the “HEAD branch”). All the changes and every commit that we make from now on will only affect this single context—other contexts will remain untouched. If we want to switch the context to a different branch, we’ll simply use the git checkout command again.

In case we want to integrate changes from one branch into another, we can “merge” them into the current working context. Imagine we’ve worked on our “contact-form” feature for a while, and now want to integrate these changes into our “master” branch. All we have to do is switch back to this branch and call git merge:

$ git checkout master
$ git merge contact-form

Using branches

I would strongly suggest that you use branches extensively in your day-to-day workflow. Branches are one of the core concepts that Git was built around. They are extremely cheap and easy to create, and simple to manage—and there are plenty of resources out there if you’re ready to learn more about using them.

Undoing things

There’s one thing that I’ve learned as a programmer over the years: mistakes happen, no matter how experienced people are. You can’t avoid them, but you can have tools at hand that help you recover from them.

One of Git’s greatest features is that you can undo almost anything. This gives me the confidence to try out things without fear—because, so far, I haven’t managed to really break something beyond recovery.

Amending the last commit

Even if you craft your commits very carefully, it’s all too easy to forget adding a change or mistype the message. With the —amend flag of the git commit command, Git allows you to change the very last commit, and it’s a very simple fix to execute. For example, if you forgot to add a certain change and also made a typo in the commit subject, you can easily correct this:

$ git add some/changed/files
$ git commit --amend -m "The message, this time without typos"

There’s only one thing you should keep in mind: you should never amend a commit that has already been pushed to a remote repository. Respecting this rule, the “amend” option is a great little helper to fix the last commit.

(For more detail about the amend option, I recommend Nick Quaranto’s excellent walkthrough.)

Undoing local changes

Changes that haven’t been committed are called “local.” All the modifications that are currently present in your working directory are “local” uncommitted changes.

Discarding these changes can make sense when your current work is… well… worse than what you had before. With Git, you can easily undo local changes and start over with the last committed version of your project.

If it’s only a single file that you want to restore, you can use the git checkout command:

$ git checkout -- file/to/restore

Don’t confuse this use of the checkout command with switching branches (see above). If you use it with two dashes and (separated with a space!) the path to a file, it will discard the uncommitted changes in a given file.

On a bad day, however, you might even want to discard all your local changes and restore the complete project:

$ git reset --hard HEAD

This will replace all of the files in your working directory with the last committed revision. Just as with using the checkout command above, this will discard the local changes.

Be careful with these operations: since local changes haven’t been checked into the repository, there is no way to get them back once they are discarded!

Undoing committed changes

Of course, undoing things is not limited to local changes. You can also undo certain commits when necessary—for example, if you’ve introduced a bug.

Basically, there are two main commands to undo a commit:

(a) git reset

Illustration showing how the `git reset` command works.

The git reset command really turns back time. You tell it which version you want to return to and it restores exactly this state—undoing all the changes that happened after this point in time. Just provide it with the hash ID of the commit you want to return to:

$ git reset -- hard 2be18d9

The —hard option is the easiest and cleanest approach, but it also wipes away all local changes that you might still have in your working directory. So, before doing this, make sure there aren’t any local changes you’ve set your heart on.

(b) git revert

Illustration showing how the `git revert` command works.

The git revert command is used in a different scenario. Imagine you have a commit that you don’t want anymore—but the commits that came afterwards still make sense to you. In that case, you wouldn’t use the git reset command because it would undo all those later commits, too!

The revert command, however, only reverts the effects of a certain commit. It doesn’t remove any commits, like git reset does. Instead, it even creates a new commit; this new commit introduces changes that are just the opposite of the commit to be reverted. For example, if you deleted a certain line of code, revert will create a new commit that introduces exactly this line, again.

To use it, simply provide it with the hash ID of the commit you want reverted:

$ git revert 2be18d9

Finding bugs

When it comes to finding bugs, I must admit that I’ve wasted quite some time stumbling in the dark. I often knew that it used to work a couple of days ago—but I had no idea where exactly things went wrong. It was only when I found out about git bisect that I could speed up this process a bit. With the bisect command, Git provides a tool that helps you find the commit that introduced a problem.

Imagine the following situation: we know that our current version (tagged “2.0”) is broken. We also know that a couple of commits ago (our version “1.9”), everything was fine. The problem must have occurred somewhere in between.

Illustration showing the commits between working and broken versions.

This is already enough information to start our bug hunt with git bisect:

$ git bisect start
$ git bisect bad
$ git bisect good v1.9

After starting the process, we told Git that our current commit contains the bug and therefore is “bad.” We then also informed Git which previous commit is definitely working (as a parameter to git bisect good).

Git then restores our project in the middle between the known good and known bad conditions:

Illustration showing that the bisect begins between the versions.

We now test this version (for example, by running unit tests, building the app, deploying it to a test system, etc.) to find out if this state works—or already contains the bug. As soon as we know, we tell Git again—either with git bisect bad or git bisect good.

Let’s assume we said that this commit was still “bad.” This effectively means that the bug must have been introduced even earlier—and Git will again narrow down the commits in question:

Illustration showing how additional bisects will narrow the commits further.

This way, you’ll find out very quickly where exactly the problem occurred. Once you know this, you need to call git bisect reset to finish your bug hunt and restore the project’s original state.

A tool that can save your neck

I must confess that my first encounter with Git wasn’t love at first sight. In the beginning, it felt just like my other experiences with version control: tedious and unhelpful. But with time, the practice became intuitive, and gained my trust and confidence.

After all, mistakes happen, no matter how much experience we have or how hard we try to avoid them. What separates the pro from the beginner is preparation: having a system in place that you can trust in case of problems. It helps you stay on top of things, especially in complex projects. And, ultimately, it helps you become a better professional.


Running Code Reviews with Confidence

 ∗ A List Apart: The Full Feed

Growing up, I learned there were two kinds of reviews I could seek out from my parents. One parent gave reviews in the form of a shower of praise. The other parent, the one with a degree from the Royal College of Art, would put me through a design crit. Today the reviews I seek are for my code, not my horse drawings, but it continues to be a process I both dread and crave.

In this article, I’ll describe my battle-tested process for conducting code reviews, highlighting the questions you should ask during the review process as well as the necessary version control commands to download and review someone’s work. I’ll assume your team uses Git to store its code, but the process works much the same if you’re using any other source control system.

Completing a peer review is time-consuming. In the last project where I introduced mandatory peer reviews, the senior developer and I estimated that it doubled the time to complete each ticket. The reviews introduced more context-switching for the developers, and were a source of increased frustration when it came to keeping the branches up to date while waiting for a code review.

The benefits, however, were huge. Coders gained a greater understanding of the whole project through their reviews, reducing silos and making onboarding easier for new people. Senior developers had better opportunities to ask why decisions were being made in the codebase that could potentially affect future work. And by adopting an ongoing peer review process, we reduced the amount of time needed for human quality assurance testing at the end of each sprint.

Let’s walk through the process. Our first step is to figure out exactly what we’re looking for.

Determine the purpose of the proposed change

Our code review should always begin in a ticketing system, such as Jira or GitHub. It doesn’t matter if the proposed change is a new feature, a bug fix, a security fix, or a typo: every change should start with a description of why the change is necessary, and what the desired outcome will be once the change has been applied. This allows us to accurately assess when the proposed change is complete.

The ticketing system is where you’ll track the discussion about the changes that need to be made after reviewing the proposed work. From the ticketing system, you’ll determine which branch contains the proposed code. Let’s pretend the ticket we’re reviewing today is 61524—it was created to fix a broken link in our website. It could just as equally be a refactoring, or a new feature, but I’ve chosen a bug fix for the example. No matter what the nature of the proposed change is, having each ticket correspond to only one branch in the repository will make it easier to review, and close, tickets.

Set up your local environment and ensure that you can reproduce what is currently the live site—complete with the broken link that needs fixing. When you apply the new code locally, you want to catch any regressions or problems it might introduce. You can only do this if you know, for sure, the difference between what is old and what is new.

Review the proposed changes

At this point you’re ready to dive into the code. I’m going to assume you’re working with Git repositories, on a branch-per-issue setup, and that the proposed change is part of a remote team repository. Working directly from the command line is a good universal approach, and allows me to create copy-paste instructions for teams regardless of platform.

To begin, update your local list of branches.

git fetch

Then list all available branches.

git branch -a

A list of branches will be displayed to your terminal window. It may appear something like this:

* master
remotes/origin/HEAD -> origin/master

The * denotes the name of the branch you are currently viewing (or have “checked out”). Lines beginning with remotes/origin are references to branches we’ve downloaded. We are going to work with a new, local copy of branch 61524-broken-link.

When you clone your project, you’ll have a connection to the remote repository as a whole, but you won’t have a read-write relationship with each of the individual branches in the remote repository. You’ll make an explicit connection as you switch to the branch. This means if you need to run the command git push to upload your changes, Git will know which remote repository you want to publish your changes to.

git checkout --track origin/61524-broken-link

Ta-da! You now have your own copy of the branch for ticket 61524, which is connected (“tracked”) to the origin copy in the remote repository. You can now begin your review!

First, let’s take a look at the commit history for this branch with the command log.

git log master

Sample output:

Author: emmajane 
Date: Mon Jun 30 17:23:09 2014 -0400

Link to resources page was incorrectly spelled. Fixed.

Resolves #61524.

This gives you the full log message of all the commits that are in the branch 61524-broken-link, but are not also in the master branch. Skim through the messages to get a sense of what’s happening.

Next, take a brief gander through the commit itself using the diff command. This command shows the difference between two snapshots in your repository. You want to compare the code on your checked-out branch to the branch you’ll be merging “to”—which conventionally is the master branch.

git diff master

How to read patch files

When you run the command to output the difference, the information will be presented as a patch file. Patch files are ugly to read. You’re looking for lines beginning with + or -. These are lines that have been added or removed, respectively. Scroll through the changes using the up and down arrows, and press q to quit when you’ve finished reviewing. If you need an even more concise comparison of what’s happened in the patch, consider modifying the diff command to list the changed files, and then look at the changed files one at a time:

git diff master --name-only
git diff master <filename>

Let’s take a look at the format of a patch file.

diff --git a/about.html b/about.html
index a3aa100..a660181 100644
	--- a/about.html
	+++ b/about.html
@@ -48,5 +48,5 @@

- A full list of <a href="">public 
+ A full list of <a href="">public 
presentations and workshops</a> Emma has given is available

I tend to skim past the metadata when reading patches and just focus on the lines that start with - or +. This means I start reading at the line immediate following @@. There are a few lines of context provided leading up to the changes. These lines are indented by one space each. The changed lines of code are then displayed with a preceding - (line removed), or + (line added).

Going beyond the command line

Using a Git repository browser, such as gitk, allows you to get a slightly better visual summary of the information we’ve looked at to date. The version of Git that Apple ships with does not include gitk—I used Homebrew to re-install Git and get this utility. Any repository browser will suffice, though, and there are many GUI clients available on the Git website.


When you run the command gitk, a graphical tool will launch from the command line. An example of the output is given in the following screenshot. Click on each of the commits to get more information about it. Many ticket systems will also allow you to look at the changes in a merge proposal side-by-side, so if you’re finding this cumbersome, click around in your ticketing system to find the comparison tools they might have—I know for sure GitHub offers this feature.

Screenshot of the gitk repository browser.

Now that you’ve had a good look at the code, jot down your answers to the following questions:

  1. Does the code comply with your project’s identified coding standards?
  2. Does the code limit itself to the scope identified in the ticket?
  3. Does the code follow industry best practices in the most efficient way possible?
  4. Has the code been implemented in the best possible way according to all of your internal specifications? It’s important to separate your preferences and stylistic differences from actual problems with the code.

Apply the proposed changes

Now is the time to start up your testing environment and view the proposed change in context. How does it look? Does your solution match what the coder thinks they’ve built? If it doesn’t look right, do you need to clear the cache, or perhaps rebuild the Sass output to update the CSS for the project?

Now is the time to also test the code against whatever test suite you use.

  1. Does the code introduce any regressions?
  2. Does the new code perform as well as the old code? Does it still fall within your project’s performance budget for download and page rendering times?
  3. Are the words all spelled correctly, and do they follow any brand-specific guidelines you have?

Depending on the context for this particular code change, there may be other obvious questions you need to address as part of your code review.

Do your best to create the most comprehensive list of everything you can find wrong (and right) with the code. It’s annoying to get dribbles of feedback from someone as part of the review process, so we’ll try to avoid “just one more thing” wherever we can.

Prepare your feedback

Let’s assume you’ve now got a big juicy list of feedback. Maybe you have no feedback, but I doubt it. If you’ve made it this far in the article, it’s because you love to comb through code as much as I do. Let your freak flag fly and let’s get your review structured in a usable manner for your teammates.

For all the notes you’ve assembled to date, sort them into the following categories:

  1. The code is broken. It doesn’t compile, introduces a regression, it doesn’t pass the testing suite, or in some way actually fails demonstrably. These are problems which absolutely must be fixed.
  2. The code does not follow best practices. You have some conventions, the web industry has some guidelines. These fixes are pretty important to make, but they may have some nuances which the developer might not be aware of.
  3. The code isn’t how you would have written it. You’re a developer with battle-tested opinions, and you know you’re right, you just haven’t had the chance to update the Wikipedia page yet to prove it.

Submit your evaluation

Based on this new categorization, you are ready to engage in passive-aggressive coding. If the problem is clearly a typo and falls into one of the first two categories, go ahead and fix it. Obvious typos don’t really need to go back to the original author, do they? Sure, your teammate will be a little embarrassed, but they’ll appreciate you having saved them a bit of time, and you’ll increase the efficiency of the team by reducing the number of round trips the code needs to take between the developer and the reviewer.

If the change you are itching to make falls into the third category: stop. Do not touch the code. Instead, go back to your colleague and get them to describe their approach. Asking “why” might lead to a really interesting conversation about the merits of the approach taken. It may also reveal limitations of the approach to the original developer. By starting the conversation, you open yourself to the possibility that just maybe your way of doing things isn’t the only viable solution.

If you needed to make any changes to the code, they should be absolutely tiny and minor. You should not be making substantive edits in a peer review process. Make the tiny edits, and then add the changes to your local repository as follows:

git add .
git commit -m "[#61524] Correcting <list problem> identified in peer review."

You can keep the message brief, as your changes should be minor. At this point you should push the reviewed code back up to the server for the original developer to double-check and review. Assuming you’ve set up the branch as a tracking branch, it should just be a matter of running the command as follows:

git push

Update the issue in your ticketing system as is appropriate for your review. Perhaps the code needs more work, or perhaps it was good as written and it is now time to close the issue queue.

Repeat the steps in this section until the proposed change is complete, and ready to be merged into the main branch.

Merge the approved change into the trunk

Up to this point you’ve been comparing a ticket branch to the master branch in the repository. This main branch is referred to as the “trunk” of your project. (It’s a tree thing, not an elephant thing.) The final step in the review process will be to merge the ticket branch into the trunk, and clean up the corresponding ticket branches.

Begin by updating your master branch to ensure you can publish your changes after the merge.

git checkout master
git pull origin master

Take a deep breath, and merge your ticket branch back into the main repository. As written, the following command will not create a new commit in your repository history. The commits will simply shuffle into line on the master branch, making git log −−graph appear as though a separate branch has never existed. If you would like to maintain the illusion of a past branch, simply add the parameter −−no-ff to the merge command, which will make it clear, via the graph history and a new commit message, that you have merged a branch at this point. Check with your team to see what’s preferred.

git merge 61524-broken-link

The merge will either fail, or it will succeed. If there are no merge errors, you are ready to share the revised master branch by uploading it to the central repository.

git push

If there are merge errors, the original coders are often better equipped to figure out how to fix them, so you may need to ask them to resolve the conflicts for you.

Once the new commits have been successfully integrated into the master branch, you can delete the old copies of the ticket branches both from your local repository and on the central repository. It’s just basic housekeeping at this point.

git branch -d 61524-broken-link
git push origin --delete 61524-broken-link


This is the process that has worked for the teams I’ve been a part of. Without a peer review process, it can be difficult to address problems in a codebase without blame. With it, the code becomes much more collaborative; when a mistake gets in, it’s because we both missed it. And when a mistake is found before it’s committed, we both breathe a sigh of relief that it was found when it was.

Regardless of whether you’re using Git or another source control system, the peer review process can help your team. Peer-reviewed code might take more time to develop, but it contains fewer mistakes, and has a strong, more diverse team supporting it. And, yes, I’ve been known to learn the habits of my reviewers and choose the most appropriate review style for my work, just like I did as a kid.




Graham Dumpleton: Hosting PHP web applications in conjunction with mod_wsgi.

 ∗ Planet Python

Yes, yes, I know. You can stop shaking your head now. It is a sad fact of life though that the need to mix both PHP web application code with Python web application code on the same Apache instance is something that some people need to do. One instance is where those PHP developers have seen the light and want to migrate their existing legacy PHP web application to Python, but are not able to do

Tuesday, 02 September


PyCon: PyCon 2015 Registration is Open!

 ∗ Planet Python

It's that time again: registration for PyCon is now open!

Here are some quick details:

  • The last three PyCons have sold out. We expect the same for 2015. 
  • The same low rates we've had for years are in effect, including the 50% discounted student tickets.
  • Early bird rates are in effect for the first 800 tickets sold.
  • Financial Aid is available! We accept applications through January 1, 2015.
  • While the event takes place in Canada, ticket prices are in USD.

Click here to register!

PyCon is simply a tremendous value. The conference takes place over three days and includes a total of 95 talks, a set of great keynotes, and includes breakfast and lunch. Add in all of the Open Spaces and Birds of a Feather sessions and the fact that over 2,000 Python users are in the same place and the deal gets even sweeter. If you buy your tickets early, you can save 15-25%!

On top of that, it's led by two days offering 32 tutorials at a cost of $150 each, with morning and afternoon sessions both days. You get three hours of instruction by some of the best in this community, along with lunch, at a great rate. Many of our tutorial instructors are professional trainers who bring their expertise and materials to the conference at a discount.


Buying tickets is part one, having a place to stay in Montréal is part two. Unlike previous locations, the conference center is a standalone building, but is surrounded by many hotels. We've negotiated conference rates with several of them, and you can book rooms with them through our registration page.

Financial Aid

While PyCon is among the best values for a software conference, especially of its size, it still requires some amount of travel and lodging expenses for a significant portion of our attendees. Thankfully the Python Software Foundation and our generous sponsors give us the ability to run a financial aid program to help more people make it to PyCon! The application period is now open and runs through January 1, 2015 - apply today!

Call for Proposals

Our CFP is open for two more weeks, through September 15. We need talks of all types to fill out our schedule, and we want you to help us with that. Everyone who uses Python brings something different to the table, and we want to cover both a breadth of topics that interest our community, and depths that help level everyone up. We encourage everyone to submit proposals whether you're a first timer or a speaking veteran.


PyCon wouldn't be possible without the help of our generous sponsors, all of whom we many thanks to. If you or your organization are interested in helping PyCon, please email Diana Clarke, conference chair, at

Spike ekipS: Embedding Python In Golang

 ∗ Planet Python

In these days, I am stick to the new private project, to find the way to run python code inside Golang runtime. This is the story I experienced to struggle with Python and Golang.

As you may know, I have some experience in python programming, exactly to say, I love Python, easy to learn, especially in python, code explains itself without any special comment, for many years until now, python would be the first choice for my new projects.

Python will be the great language, but it’s not at this time, because it’s performance is not good enough for manysome area, e.g.

  • heavy-loaded network server like web server.
  • long running tasks
  • handling massive data
  • critical tasks of memory
  • etc.

I thought, if possible to combine Golang and python, it can bring the extreme way to me, getting the productivity from python and performance from Golang simultaneously.

Why golang, Instead Of C

Using C extension is well known and most reasonable consideration to extend python. If we can extend python with other language, it will be benefit for both side.

The reason to select golang within many other languages, I love Golang and it’s goroutine. Golang will be the best /* alternative */ language for python programmers, who miss the static typing and performance.

Golang is the fast growing in the language world, the brilliant features like goroutine and channel, etc will stimulate thenother languages to focus in their perfornance improvement.

The Bottom Line First.

After all, I dumped the latest proof-of-concepts in embedding-python-in-golang in github, I introduced the several methods to embed python in Golang.

The final goal will be to run the django and flask application with WSGI interface inside the net/http of Golang.

Story to find the way.

At first I tried to find the similar attempts and there are already similar project,

gopy seems to be prominent, it can make python extension in golang and go-python implements the C-API of python in golang, but it does not provide some key API like Py_InitModule.

I wanted to get result as soon as possible, so I started with gopy.

2 days with gooy

At first, I tested gopy. gopy can make Golang code to python extension, so I thought,

  1. Write golang code for python extension
  2. Build python extension with gopy
  3. Import and call the golang function

At this time, I wanted to run Golang code in python, because I just want to combine Golang and python adhesively. Whether which language will be main runtime does not matter.

But one thing I missed, the gopy project was not maintained in these days, and it does have some bugs, e.g. NewString does not return the string of Golang and several other bugs made me so to be hesitate to use this library.

I needed another way, that was go-python.

And I met go-python

.. To be continued.

Jeff Knupp: What is a NoSQL Database? Learn By Writing One In Python

 ∗ Planet Python

NoSQL is a term that has become ubiquitous in recent years. But what does "NoSQL" actually mean? How and why is it useful? In this article, we'll answer these questions by creating a toy NoSQL database in pure Python (or, as I like to call it, "slightly structured pseudo-code").


To most, SQL is synonymous with "database". SQL, an acronym for Structured Query Language, is not a database technology itself, however. Rather, it describes the language by which one retrieves data from a RDBMS, or Relational Database Management System. MySQL, PostgreSQL, MS SQL Server, and Oracle are all examples of RDBMSs.

The word "Relational" in the acronym RDBMS is the most informative. Data is organized into tables, each with a set series of columns with an associated type. The description of all tables, their columns, and the columns' types are referred to as the database's schema. The schema completely describes the structure of the database, with a description of each table. For example, a Car table may have the following columns:

  • Make: a string
  • Model: a string
  • Year: a four-digit number; alternatively, a date
  • Color: a string
  • VIN (Vehicle Identification Number): a string

A single entry in a table is called a row, or record. To distinguish one record from another, a primary key is usually defined. The primary key for a table is one of its columns (or a combination thereof) that uniquely identifies each row. In the Car table, VIN is a natural choice to be the table's primary key as it is guaranteed to be unique between cars. Two rows may share the exact same values for Make, Model, Year, and Color but refer to different cars, meaning they would have different VINs. If two rows have the same VIN, we don't even have to check the other columns, they must refer to the same car.


SQL lets us query this database to gain useful information. To query simply means to ask questions of the RDBMS in a structured language and interpret the rows it returns as the answer. Imagine the database represents all vehicles registered in the US. To get all records, we could write the following SQL query against the database:

SELECT Make, Model FROM Car;

A translation of the SQL into plain English might be:

  • "SELECT": "Show me"
  • "Make, Model": "the value of Make and Model"
  • "FROM Car": "for each row in the Car table"

Or, "Show me the value of Make and Model for each row in the Car table". We'd get back a list of results, each with Make and Model. If we cared only about the color of cars from 1994, we could say:

SELECT Color FROM Car WHERE Year = 1994;

In this case, we'd get back a list like


Lastly, using the table's primary key, we could look up a specific car by looking up a VIN:

SELECT * FROM Car where VIN = '2134AFGER245267';

That would give us the specific properties of that car.

Primary keys are defined to be unique. That is, a specific car with a specific VIN must only appear in the table at most once. Why is that important? Let's look at an example:


Imagine we are running an auto repair business. Among other things, we need to keep track of a vehicle's service history: the record of all repairs and tune ups we've performed on that car. We might create a ServiceHistory table with the following columns:

  • VIN
  • Make
  • Model
  • Year
  • Color
  • Service Performed
  • Mechanic
  • Price
  • Date

Thus, each time a car comes in to get serviced, we add a new row to the table with all of the car's information along with what we did to it, who the mechanic was, how much it cost, and when the service was performed.

But wait. All of the columns related to the car itself are always the same for the same car. That is, if I bring in my Black 2014 Lexus RX 350 10 times for service, I'll need to record the Make, Model, Year, and Color each time, even though they won't change. Rather than repeat all of that information, it makes more sense to store it once and look it up when necessary.

How would we do this? We'd create a second table: Vehicle, with the following columns:

  • VIN
  • Make
  • Model
  • Year
  • Color

For the ServiceHistory table, we now want to trim down to the following columns:

  • VIN
  • Service Performed
  • Mechanic
  • Price
  • Date

Why does VIN appear in both tables? Because we need a way to specify that this vehicle in the ServiceHistory table refers to that vehicle in the Vehicle table. This way, we only have to store information about a particular car once. Each time it comes in for repair, we create a new row in the ServiceHistory table but not the Vehicle table; it's the same car, after all.

We can also issue queries that span the implicit relationship between Vehicle and ServiceHistory:

SELECT Vehicle.Model, Vehicle.Year FROM Vehicle, ServiceHistory WHERE Vehicle.VIN = ServiceHistory.VIN AND ServiceHistory.Price > 75.00;

This query seeks to determine the Model and Year for all cars where the repair costs were greater than $75.00. Notice that we specify that the way to match rows from the Vehicle table to rows in the ServiceHistory table is to match up the VIN values. What it gives us back is a set of rows with the columns of both tables. We refine this by saying we only want the "Model" and "Year" columns of the "Vehicle" table.

If our database has no indexes (or, more correctly, indices), the query above would need to perform a table scan to locate rows that match our query. Table scans are an inspection of each row in the table in sequence and are notoriously slow. Indeed, they represent the slowest possible method of query execution.

Table scans can be avoided through the use of an index on a column or set of columns. Think of indices as data structures that allow us to find a particular value (or range of values) in the indexed column very quickly by pre-sorting the values. That is, if we had an index on the Price column, instead of looking through all rows one-at-a-time to determine if the price was greater than 75.00, we could simply use the information contained in the index to "jump" to the first row with a price greater than 75.00 and return every subsequent row (which would have a price at least as high as 75.00, since the index is ordered).

When dealing with non-trivial amounts of data, indices become an indispensable tool for improving query speed. Like all things, however, they come at a cost: the index's data structure consumes memory that could otherwise be used to store more data in the database. It's a trade off that one must examine in each individual case, but it's very common to index frequently queried columns.

The Clear Box

Advanced features like indices are possible due to the database's ability to inspect a table's schema (the description of what type of data each column holds) and make rational decisions based on the data. That is, to a database, a table is the opposite of a "black box" (a clear box?).

Keep this fact in mind when we talk about NoSQL databases. It becomes an important part of the discussion regarding the ability to query different types of database engines.


A table's schema, we've learned, is a description of the names of the columns and the type of data they contain. It also contains information like which columns can be blank, which must be unique, and all other constraints on column values. A table may only have one schema at any given time and all rows in the table must conform to the schema.

This is an important restriction. Imagine you have a database table with millions of rows of customer information. Your sales team would like to begin capturing an additional piece of data (say, the user's age) to increase the precision of their email marketing algorithm. This requires you to alter the table by adding a column. You'll also need to decide if each row in the table needs a value for this column. Often times, it makes sense to make a column required, but doing so would require information we simply don't have access to (like the age of every user already in the database). Therefore, trade offs are often made in this regard.

In addition, making schema changes to very large database tables is rarely a simple matter. Having a rollback plan in case something goes wrong is important, but it's not always possible to undo a schema change once it's been made. Schema maintenance is probably one of the most difficult parts of the job of a DBA.

Key/Value Stores

Far before the term "NoSQL" existed, Key/Value Data Stores like memcached provided storage for data without the overhead of a table schema. Indeed, in K/V stores, there are no "tables" at all. There are simply keys and values. If a key/value store sounds familiar, that's because it's built upon the same principles as Python's dict and set classes: using hash tables to provide quick key-based access to data. The most primitive Python-based NoSQL database would simply be a big dictionary.

To understand how they work, let's write one ourselves! We'll start with a very simple design:

  • A Python dict as the primary data store
  • Only support strings as keys
  • Support for storing integers, strings, and lists
  • A simple TCP/IP server that uses ASCII strings for messaging
  • Slightly advanced commands like INCREMENT, DELETE, APPEND, and STATS

The benefit of building the data store with an ASCII-based TCP/IP interface is that we can use the simple telnet program to interact with our server; no special client is needed (though writing one would be a good exercise and can be done in about 15 lines).

We need a "wire format" for the messages we send to the server and for the responses it sends back. Here's a loose specification:

Commands Supported

  • PUT
    • Arguments: Key, Value
    • Purpose: Insert a new entry into the data store
  • GET
    • Arguments: Key
    • Purpose: Retrieve a stored value from the data store
    • Arguments: Key, Value
    • Purpose: Insert a new list entry into the data store
    • Arguments: Key
    • Purpose: Retrieve a stored list from the data store
    • Arguments: Key, Value
    • Purpose: Add an element to an existing list in the data store
    • Arguments: Key
    • Purpose: Increment the value of an integer value in the data store
    • Arguments: Key
    • Purpose: Delete an entry from the data store
    • Arguments: N/A
    • Purpose: Request statistics on how many successful/unsuccessful executions of each command were executed

Now let's define the message structure itself.

Message Structure

Request Messages

A Request Message consists of a command, a key, a value, and a value type. The last three are optional depending on the message. A ; is used as a delimiter. There must always be three ; characters in the message, even if some optional fields are not included.


  • COMMAND is a command from the list above
  • KEY is a string to be used as a data store key (optional)
  • VALUE is a integer, list, or string to be stored in the data store (optional)
    • Lists are represented as a comma-separated series of strings, e.g. "red,green,blue"
  • VALUE TYPE describes what type VALUE should be interpreted as
    • Possible values: INT, STRING, LIST
  • "PUT;foo;1;INT"

  • "GET;foo;;"

  • "PUTLIST;bar;a,b,c;LIST"

  • "APPEND;bar;d;STRING

  • "GETLIST;bar;;"

  • "STATS;;;"

  • "INCREMENT;foo;;"

  • "DELETE;foo;;"

Response Messages

A Response Message consists of two parts, separated by a ;. The first part is always True|False based on whether the command was successful. The second part is the command message. On errors, this will describe the error. On successful commands that don't expect a value to be returned (like PUT), this will be a success message. For commands that expect a value to be returned (like GET), this will be the value itself.

  • "True;Key [foo] set to [1]"

  • "True;1"

  • "True;Key [bar] set to [['a', 'b', 'c']]"

  • "True;Key [bar] had value [d] appended"

  • "True;['a', 'b', 'c', 'd']

  • "True;{'PUTLIST': {'success': 1, 'error': 0}, 'STATS': {'success': 0, 'error': 0}, 'INCREMENT': {'success': 0, 'error': 0}, 'GET': {'success': 0, 'error': 0}, 'PUT': {'success': 0, 'error': 0}, 'GETLIST': {'success': 1, 'error': 0}, 'APPEND': {'success': 1, 'error': 0}, 'DELETE': {'success': 0, 'error': 0}}"

Show Me The Code!

I'll present the code in digestible chunks. The entire server clocks in at just under 180 lines of code, so it's a quick read.

Set Up

Below is a lot of the boilerplate setup code required for our server:

"""NoSQL database written in Python."""

# Standard library imports
import socket

HOST = 'localhost'
PORT = 50505
SOCKET = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    'PUT': {'success': 0, 'error': 0},
    'GET': {'success': 0, 'error': 0},
    'GETLIST': {'success': 0, 'error': 0},
    'PUTLIST': {'success': 0, 'error': 0},
    'INCREMENT': {'success': 0, 'error': 0},
    'APPEND': {'success': 0, 'error': 0},
    'DELETE': {'success': 0, 'error': 0},
    'STATS': {'success': 0, 'error': 0},

Not much to see here, just an import and some initialization of data.

Set Up (Cont'd)

I'll now skip a bit of code so that I can show the rest of the setup code. Note that it refers to functions that don't exist yet. That's fine, since I'm jumping around. In the full version (presented at the end), everything is in the proper order. Here's the rest of the setup code:

    'PUT': handle_put,
    'GET': handle_get,
    'GETLIST': handle_getlist,
    'PUTLIST': handle_putlist,
    'INCREMENT': handle_increment,
    'APPEND': handle_append,
    'DELETE': handle_delete,
    'STATS': handle_stats,
DATA = {}

def main():
    """Main entry point for script."""
    SOCKET.bind((HOST, PORT))
    while 1:
        connection, address = SOCKET.accept()
        print 'New connection from [{}]'.format(address)
        data = connection.recv(4096).decode()
        command, key, value = parse_message(data)
        if command == 'STATS':
            response = handle_stats()
        elif command in (
            response = COMMAND_HANDLERS[command](key)
        elif command in (
            response = COMMAND_HANDLERS[command](key, value)
            response = (False, 'Unknown command type [{}]'.format(command))
        update_stats(command, response[0])
        connection.sendall('{};{}'.format(response[0], response[1]))

if __name__ == '__main__':

We've created what's commonly referred to as a look-up table called COMMAND_HANDLERS. It works by associating the name of the command with the function used to handle commands of that type. So, for example, if we get a GET command, saying COMMAND_HANDLERS[command](key) is the same as saying handle_get(key). Remember, functions can be treated as values and can be stored in a dict like any other value.

In the code above, I decided to handle each group of commands requiring the same number of arguments separately. I could have simply forced all handle_ functions to accept a key and value, I just decided this was made the handler functions more clear, easier to test, and was less error prone.

Note that the socket code is minimal. Though our entire server is based on TCP/IP communication, there's really not much interaction with low-level networking code.

The last thing to note is so innocuous you might have missed it: the DATA dictionary. This is where we'll actually store the key-value pairs that make up our database.

Command Parser

Let's take a look at the command parser, responsible for making sense of incoming messages:

def parse_message(data):
    """Return a tuple containing the command, the key, and (optionally) the
    value cast to the appropriate type."""
    command, key, value, value_type = data.strip().split(';')
    if value_type:
        if value_type == 'LIST':
            value = value.split(',')
        elif value_type == 'INT':
            value = int(value)
            value = str(value)
        value = None
    return command, key, value

Here we can see type conversion occurring. If the value is meant to be a list, we know we can create the proper value by calling str.split(',') on the string. For an int, we simply make use of the fact that int() can take strings. Ditto for strings and str().

Command Handlers

Below is the code for the command handlers. They are all quite straight-forward and (hopefully) look as you would expect. Notice there's a good deal of error checking, but it's certainly not exhaustive. As you're reading, try to find error cases the code misses and post them in the discussion.

def update_stats(command, success):
    """Update the STATS dict with info about if executing
    *command* was a *success*."""
    if success:
        STATS[command]['success'] += 1
        STATS[command]['error'] += 1

def handle_put(key, value):
    """Return a tuple containing True and the message
    to send back to the client."""
    DATA[key] = value
    return (True, 'Key [{}] set to [{}]'.format(key, value))

def handle_get(key):
    """Return a tuple containing True if the key exists and the message
    to send back to the client."""
    if key not in DATA:
        return(False, 'ERROR: Key [{}] not found'.format(key))
        return(True, DATA[key])

def handle_putlist(key, value):
    """Return a tuple containing True if the command succeeded and the message
    to send back to the client."""
    return handle_put(key, value)

def handle_getlist(key):
    """Return a tuple containing True if the key contained a list and
    the message to send back to the client."""
    return_value = exists, value = handle_get(key)
    if not exists:
        return return_value
    elif not isinstance(value, list):
        return (
            'ERROR: Key [{}] contains non-list value ([{}])'.format(key, value)
        return return_value

def handle_increment(key):
    """Return a tuple containing True if the key's value could be incremented
    and the message to send back to the client."""
    return_value = exists, value = handle_get(key)
    if not exists:
        return return_value
    elif not isinstance(value, int):
        return (
            'ERROR: Key [{}] contains non-int value ([{}])'.format(key, value)
        DATA[key] = value + 1
        return (True, 'Key [{}] incremented'.format(key))

def handle_append(key, value):
    """Return a tuple containing True if the key's value could be appended to
    and the message to send back to the client."""
    return_value = exists, list_value = handle_get(key)
    if not exists:
        return return_value
    elif not isinstance(list_value, list):
        return (
            'ERROR: Key [{}] contains non-list value ([{}])'.format(key, value)
        return (True, 'Key [{}] had value [{}] appended'.format(key, value))

def handle_delete(key):
    """Return a tuple containing True if the key could be deleted and
    the message to send back to the client."""
    if key not in DATA:
        return (
            'ERROR: Key [{}] not found and could not be deleted'.format(key)
        del DATA[key]

def handle_stats():
    """Return a tuple containing True and the contents of the STATS dict."""
    return (True, str(STATS))

Two things to take note of: the use of multiple assignment and code re-use. A number of functions are simply wrappers around existing functions with a bit more logic, like handle_get and handle_getlist for example. Since we are occasionally just sending back the results of an existing function and other times inspecting what that function returned, multiple assignment is used.

Look at handle_append. If we try to call handle_get and the key doesn't exist, we can simply return exactly what handle_get returned. Thus, we'd like to be able to refer to the tuple returned by handle_get as a single return value. That lets us simply say return return_value if the key does not exist.

If it does exist, we need to inspect the value that was returned. Thus, we'd also like to refer to the return value of handle_get as separate variables. To handle both the case above and the case where we need to handle the results separately, we use multiple assignment. This gives us the best of both worlds without requiring multiple lines where our purpose is unclear. return_value = exists, list_value = handle_get(key) makes it explicit that we're going to be referring to the value returned by handle_get in at least two different ways.

How Is This a Database?

The program above is certainly not an RDBMS, but it definitely qualifies as a NoSQL database. The reason it was so easy to create is because we don't have any real interaction with the data. We do minimal type checking, but otherwise just store whatever the user sends. If we needed to store more structured data, we'd likely need to create a schema for the database and refer to it while storing and retrieving data.

So if NoSQL databases are easier to write, easier to maintain, and easier to reason about, why don't we all just run mongoDB instances and be done with it? There is, of course, a trade off for all this data flexibility that NoSQL databases afford us: searchability.

Querying Data

Imagine we used our NoSQL database above to store the car data from earlier. We might store them using the VIN as the key and a list of values as each column value, i.e. 2134AFGER245267 = ['Lexus', 'RX350', 2013, Black]. Of course, we've lost the meaning of each index in the list. We just have to remember somewhere that index one stores the Make of the car and index two stores the Year.

Worse, what happens when we want to run some of the queries from earlier? To find the colors of all cars from Year 1994 becomes a nightmare. We have to go through every value in DATA, somehow determine if the value is storing car data or something else entirely, look at index two, then take the value of index three if index two is equal to 1994. That's much worse than a table scan, since it not only scans every row in the data store but needs to apply a somewhat sophisticated set of rules to answer the query.

The authors of NoSQL databases are aware of these issues, of course, and (since querying is generally a useful feature) have come up with a number of ways to make queries possible. One way is to structure the data using, say, JSON and allow for references to other rows to represent relationships. Also, most NoSQL databases have some concept of namespaces, where data of a single type can be stored in it's own "section" of the database, allowing a query engine to make use of the fact that it knows the "shape" of the data being queried.

Of course, far more sophisticated approaches exist (and are implemented) to increase queryability, but there will always exist a trade off between storing schema-less data and queryability. Our database, for example, only supports querying by key. Things would get a lot more complex if we needed to support a richer set of queries.


Hopefully, by now it's clear what "NoSQL" means. We learned a bit of SQL and how RDBMSs work. We saw how data is retrieved from an RDBMS (using SQL queries). We built a toy NoSQL database to examine the trade offs between queryability and simplicity. We also discussed a few ways database authors deal with this.

The topic of databases, even of simple key-value stores, is incredibly deep. We've merely scratched the surface here. Hopefully however, you learned a bit about what NoSQL means, how it works, and when it's a good idea to use. You can continue the conversation at Chat For Smart People, the discussion board for this site.



Machinalis: Decision tree classifier

 ∗ Planet Python


This post presents a simple but still fully functional Python3 implementation of a decision tree classifier. It is not aimed to be a tutorial on machine learning, classifications or even decision trees: there are a lot of resources on the web already. The main idea is to provide a Python example implementation for those who are familiar or comfortable with this language.

There are several decision tree algorithms that have been developed over time, each one improving or optimizing something over the predecessor. In this post, the implementation presented corresponds to the first well-known algorithm on the subject: the Iterative Dichotomiser 3 (ID3), developed in 1986 by Ross Quinlan.

For those familiar with the scikit-learn library, its documentation includes a specific section devoted to decision trees. This API provides a production-ready, fully parametric implementation of an optimized version of the CART algorithm.


The code is here:

Basically a tree is represented using Python dicts. A couple of very simple classes that extend dict where created to distinguish between tree or leaf nodes. Also, a namedtuple was defined to match each training sample with its corresponding class. The necessary information_gain and entropy functions where created, their implementations really simple thanks to Python’s standard collections lib.

Finally, the main piece of code is the tree-creation method: this is where all the magic happens.

def create_decision_tree(self, training_samples, predicting_features):
    """Recursively, create a desition tree and return the parent node."""

    if not predicting_features:
        # No more predicting features
        default_klass = self.get_most_common_class(training_samples)
        root_node = DecisionTreeLeaf(default_klass)
        klasses = [sample.klass for sample in training_samples]
        if len(set(klasses)) == 1:
            target_klass = training_samples[0].klass
            root_node = DecisionTreeLeaf(target_klass)
            best_feature = self.select_best_feature(training_samples,
            # Create the node to return and create the sub-tree.
            root_node = DecisionTreeNode(best_feature)
            best_feature_values = {s.sample[best_feature]
                                   for s in training_samples}
            for value in best_feature_values:
                samples = [s for s in training_samples
                           if s.sample[best_feature] == value]
                # Recursively, create a child node.
                child = self.create_decision_tree(samples,
                root_node[value] = child
    return root_node

Motivated by the already mentioned scikit-learn library, the algorithm is developed within a class with the following methods:

  • fit(training_samples, known_labels) : Creates the decision tree using the training data.
  • predict(samples) : given a fitted model, predict the label of a new set of data. It returns the learned label for each sample in the given array.
  • score(samples, known_labels) : predicts the labels for the given data samples and contrasts with the truth provided in the known_labels. Returns a score which is a number between 0 (no matches) and 1 (perfect match).

Other than that, the code is pretty much self explanatory. Using the standard Python module collections, the auxiliary methods (select_best_feature, information_gain, entropy) are very concise. The tree is easily implemented using dict:

  • Each node is either a leaf or a branch: If it is a leaf then it represents a class. If it is a branch, then it represents a feature.
  • Each branch has got as many children as possible values has the represented feature.

Then, to classify a given vector X = [f0, ..., fn], starting with the root of the generated tree:

  1. Take the root node (usually a branch, unless X has only one feature, which is not really useful).
  2. Such node will have a related feature, fi, so we check the value of X for the target feature: v = X [ fi]
  3. If node[v] is a leaf, then we assign the leaf’s related class to X.
  4. If v is not a key in node[v], then we can’t assign a class with the existing tree and we assign a default class (the most probable one).
  5. If node[v] is another branch, we repeat this procedure using the new node as root.

It is working in Python 3.4, some minor modifications are needed for it to work properly in Python 2 (for example, the entropy function relies on the fact that integer division in Python 3 returns float).

I’ll not dig further in the details as this is not supposed to be a tutorial or course on decision trees. Some minimal previous knowledge should be enough to understand the code. In any case, don’t hesitate to post your questions or comments.

To keep updated about Machine Learning, Data Processing and Complex Web Development follow us on @machinalis.



Graham Dumpleton: Python module search path and mod_wsgi.

 ∗ Planet Python

When you run the Python interpreter on the command line as an interactive console, if a module to be imported resides in the same directory as you ran the 'python' executable, then it will be found no problems. How can this be though, as we haven't done anything to add the current working directory into 'sys.path', nor has the Python interpreter itself done so? >>> import os, sys>>> os.getcwd()



Graham Dumpleton: Debugging with pdb when using mod_wsgi.

 ∗ Planet Python

In the early days of mod_wsgi I made a decision to impose a restriction on the use of stdin and stdout by Python WSGI web applications. My reasoning around this was that if you want to make a WSGI application portable to any WSGI deployment mechanism, then you should not be attempting to use stdin/stdout. This includes either reading or writing to these file objects, or even performing a check on

Monday, 01 September


Graham Dumpleton: What is the current version of mod_wsgi?

 ∗ Planet Python

If you pick up any Linux distribution, you will most likely come to the conclusion that the newest version of mod_wsgi available is 3.3 or 3.4. Check when those versions were released and you will find: mod_wsgi version 3.3 - released 25th July 2010 mod_wsgi version 3.4 - released 22nd August 2012 Problem is that people look at that and seeing that there are only infrequent releases and nothing

Graham Dumpleton: Using Python virtual environments with mod_wsgi.

 ∗ Planet Python

You should be using Python virtual environments and if you don't know why you should, maybe you should find out. That said, the use of Python virtual environments was the next topic that came up in my hallway track discussions at DjangoCon US 2014. The pain point here is in part actually of my own creation. This is because although there are better ways of using Python virtual environments with

PyPy Development: Python Software Foundation Matching Donations this Month

 ∗ Planet Python

We're extremely excited to announce that for the month of September, any amount
you donate to PyPy will be match (up to $10,000) by the Python Software

This includes any of our ongoing fundraisers: NumPyPy, STM, Python3, or our
general fundraising.

Here are some of the things your previous donations have helped accomplish:

  • Getting PyPy3 completed (currently 3.2, with 3.3 work underway)
  • New research and production engineering on STM for PyPy
  • Lots of progress on NumPy for PyPy
  • Significant performance improvements

You can see a preview of what's coming in our next 2.4 release in the draft
release notes

Thank you to all the individuals and companies which have donated so far.

So please, donate today:

(Please be aware that the donation progress bars are not live updating, so
don't be afraid if your donation doesn't show up immediately).

Carl Trachte: PDF - Removing Pages and Inserting Nested Bookmarks

 ∗ Planet Python

I blogged before about PyPDF2 and some initial work I had done in response to a request to get a report from Microsoft SQL Server Reporting Services into PDF format.  Since then I've had better luck with PyPDF2 using it with Python 3.4.  Seldom do I need to make any adjustments to either the PDF file or my Python code to get things to work.

Presented below is the code that is working for me now.  The basic gist of it is to strip the blank pages (conveniently SSRS dumps the report with a blank page every other page) from the SSRS PDF dump and reinsert the bookmarks in the right places in a new final document.  The report I'm doing is about 30 pages, so having bookmarks is pretty critical for presentation and usability.

The approach I took was to get the bookmarks out of the PDF object model and into a nested dictionary that I could understand and work with easily.  To keep the bookmarks in the right order for presentation I used collections.OrderedDict instead of just a regular Python dictionary structure.  The code should work for any depth level of nested parent-child PDF bookmarks.  My report only goes three or four levels deep, but things can get fairly complex even at that level.

There are a couple artifacts of the actual report I'm doing - the name "comparisonreader" refers to the subject of the report, a comparison of accounting methods' results.  I've tried to sanitize the code where appropriate, but missed a thing or two.

It may be a bit overwrought (too much code), but it gets the job done.  Thanks for having a look.


Strip out blank pages and keep bookmarks for
SQL Server SSRS dump of model comparison report (pdf).

import PyPDF2 as pdfimport math
from collections import OrderedDict

INPUTFILE = 'SSRSdump.pdf'
OUTPUTFILE = 'Finalreport.pdf'


# Adobe PDF document element keys.
PAGE = '/Page'
PAGES = '/Pages'
ROOT = '/Root'
KIDS = '/Kids'
TITLE = '/Title'

# Python/PDF library types.
NODE = pdf.generic.Destination
CHILD = list

ADDPAGE = 'Adding page {0:d} from SSRS dump to page {1:d} of new document . . .'

# dictionary keys
NAME = 'name'
CHILDREN = 'children'

INDENT = 4 * ' '

ADDEDBOOKMARK = 'Added bookmark {0:s} to parent bookmark {1:s} at depthlevel {2:d}.'


def getpages(comparisonreader):
    From a PDF reader object, gets the
    page numbers of the odd numbered pages
    in the old document (SSRS dump) and
    the corresponding page in the final

    Returns a generator of two tuples.
    # get number of pages then get odd numbered pages
    # (even numbered indices)
    numpages = comparisonreader.getNumPages()
    return ((x, int(x/2)) for x in range(numpages) if x % 2 == 0)

def fixbookmark(bookmark):
    bookmark is a PyPDF2 bookmark object.

    Side effect function that changes bookmark
    page display mode to full page.
    # getObject yields a dictionary
    props = bookmark.getObject()[OBJECTKEY][LISTKEY][1] = pdf.generic.NameObject(FULLPAGE)
    return 0

def matchpage(page, pages):
    Find index of page match.

    page is a PyPDF2 page object.
    pages is the list (PyPDF2 array) of page objects.
    Returns integer page index in new (smaller) doc.
    originalpageidx = pages.index(page)
    return math.floor((originalpageidx + 1)/2)

def pagedict(bookmark, pages):
    Creates page dictionary for PyPDF2 bookmark object.

    bookmark is a PDF object (dictionary).
    pages is a list of PDF page objects (dictionary).
    Returns two tuple of a dictionary and
    integer page number.
    page = matchpage(bookmark[PAGE].getObject(), pages)
    title = bookmark[TITLE]
    # One bookmark per page per level.
    lookupdict = OrderedDict()
    return lookupdict, page

def recursivepopulater(bookmark, pages):
    Fills in child nodes of bookmarks
    recursively and returns dictionary.
    dictx = OrderedDict()
    for pagex in bookmark:
        if type(pagex) is NODE:
            # get page info and update dictionary with it
            lookupdict, page = pagedict(pagex, pages)
        elif type(bookmark) is CHILD:
            newdict = OrderedDict()
            newdict.update(recursivepopulater(pagex, pages))
    return dictx

def makenewbookmarks(pages, bookmarks):
    Main function to generate bookmark dictionary:

    {page number: {name:<name>,
                   children:[<more bookmarks>]},
                   and so on.

    Returns dictionary.
    dictx = OrderedDict()
    # top level bookmarks
    # it's going to go bookmark, list, bookmark, list, etc.
    for bookmark in bookmarks:
        if type(bookmark) is NODE:
            # get page info and update dictionary with it
            lookupdict, page = pagedict(bookmark, pages)
        elif type(bookmark) is CHILD:
            dictx[page][CHILDREN] = recursivepopulater(bookmark, pages)
    return dictx

def printbookmarkaddition(name, parentname, depthlevel):
    Print notification of bookmark addition.

    Indentation based on integer depthlevel.
    name is the string name of the bookmark.
    parentname is the string name of the parent

    Side effect function.
    args = name, parentname, depthlevel
    indent = depthlevel * INDENT
    print(indent + ADDEDBOOKMARK.format(*args))

def dealwithbookmarks(comparisonreader, output, bookmarkdict, depthlevel, levelparent=None, parentname=None):
    Fix bookmarks so that they are properly
    placed in the new document with the blank
    pages removed. Recursive side effect function.

    comparisonreader is the PDF reader object
    for the original document.

    output is the PDF writer object for the
    final document.

    bookmarkdict is a dictionary of bookmarks.

    depthlevel is the depth inside the nested
    dictionary-list structure (0 is the top).

    levelparent is the parent bookmark.

    parentname is the name of the parent bookmark.
    depthlevel += 1
    for pagekeylevel in bookmarkdict:
        namelevel = bookmarkdict[pagekeylevel][NAME]
        levelparentii = output.addBookmark(namelevel, pagekeylevel, levelparent)
        if depthlevel == 0:
            parentname = TOPLEVEL
        printbookmarkaddition(namelevel, parentname, depthlevel)
        # dictionary
        secondlevel = bookmarkdict[pagekeylevel][CHILDREN]
        argsx = comparisonreader, output, secondlevel, depthlevel, levelparentii, namelevel
        # Recursive call.

def cullpages():
    Fix SSRS PDF dump by removing blank
    ssrsdump = open(INPUTFILE, 'rb')
    finalreport = open(OUTPUTFILE, 'wb')
    comparisonreader = pdf.PdfFileReader(ssrsdump)
    pageindices = getpages(comparisonreader)
    output = pdf.PdfFileWriter()
    # add pages from SSRS dump to new pdf doc
    for (old, new) in pageindices:
        print(ADDPAGE.format(old, new))
        pagex = comparisonreader.getPage(old)

    # Attempt to add bookmarks from original doc
    # getOutlines yields a list of nested dictionaries and lists:
    #    outermost list - starts with parent bookmark (dictionary)
    #        inner list - starts with child bookmark (dictionary)       
    #                     and so on
    # The SSRS dump and this list have bookmarks in correct order.
    bookmarks = comparisonreader.getOutlines()
    # Get page numbers using this methodology (indirect object references)
    # list of IndirectObject's of pages in order
    pages = [pagen.getObject() for pagen in
    # Bookmarks.
    # Top level is list of bookmarks.
    # List goes parent bookmark (Destination object)
    #               child bookmarks (list)
    #                   and so on.
    bookmarkdict = makenewbookmarks(pages, bookmarks)
    # Initial level of -1 allows increment to 0 at start.
    dealwithbookmarks(comparisonreader, output, bookmarkdict, -1)

    print('\n\nWriting final report . . .')

if __name__ == '__main__':

Python Software Foundation: Matching Donations to PyPy in September!

 ∗ Planet Python

We're thrilled to announce that we will be matching donations made to the PyPy project for the month of September. For every dollar donated this month, the PSF will also give a dollar, up to a $10,000 total contribution. Head to and view the donation options on the right side of the page, including general funding or a donation targeted to their STM, Py3k, or NumPy efforts.

We've previously given a $10,000 donation to PyPy, and more recently seeded the STM efforts with $5,000. The PyPy project works with the Software Freedom Conservancy to manage fund raising efforts and the usage of the funds, and they'll be the ones notifying us of how you all made your donations. At the end of the month, we'll do our part and chip  in to making PyPy even better.

The matching period runs today through the end of September.


Dodging Browser Zero Days - Changing your Org's Default Browser Centrally, (Mon, Sep 1st)

 ∗ SANS Internet Storm Center, InfoCON: green

In a recent sto ...(more)...


Leonardo Giordani: Python 3 OOP Part 5 - Metaclasses

 ∗ Planet Python

Previous post

Python 3 OOP Part 4 - Polymorphism

The Type Brothers

The first step into the most intimate secrets of Python objects comes from two components we already met in the first post: class and object. These two things are the very fundamental elements of Python OOP system, so it is worth spending some time to understand how they work and relate each other.

First of all recall that in Python everything is an object, that is everything inherits from object. Thus, object seems to be the deepest thing you can find digging into Python variables. Let's check this

``` python

a = 5 type(a) a.class a.class.bases (,) object.bases () ```

The variable a is an instance of the int class, and this latter inherits from object, which inherits from nothing. This demonstrates that object is at the top of the class hierarchy. However, as you can see, both int and object are called classes (<class 'int'>, <class 'object'>). Indeed, while a is an instance of the int class, int itself is an instance of another class, a class that is instanced to build classes

``` python

type(a) type(int) type(float) type(dict) ```

Since in Python everything is an object, everything is the instance of a class, even classes. Well, type is the class that is instanced to get classes. So remember this: object is the base of every object, type is the class of every type. Sounds puzzling? It is not your fault, don't worry. However, just to strike you with the finishing move, this is what Python is built on

``` python

type(object) type.bases (,) ```

If you are not about to faint at this point chances are that you are Guido van Rossum of one of his friends down at the Python core development team (in this case let me thank you for your beautiful creation). You may get a cup of tea, if you need it.

Jokes apart, at the very base of Python type system there are two things, object and type, which are inseparable. The previous code shows that object is an instance of type, and type inherits from object. Take your time to understand this subtle concept, as it is very important for the upcoming discussion about metaclasses.

When you think you grasped the type/object matter read this and start thinking again

``` python

type(type) ```

The Metaclasses Take Python

You are now familiar with Python classes. You know that a class is used to create an instance, and that the structure of this latter is ruled by the source class and all its parent classes (until you reach object).

Since classes are objects too, you know that a class itself is an instance of a (super)class, and this class is type. That is, as already stated, type is the class that is used to build classes.

So for example you know that a class may be instanced, i.e. it can be called and by calling it you obtain another object that is linked with the class. What prepares the class for being called? What gives the class all its methods? In Python the class in charge of performing such tasks is called metaclass, and type is the default metaclass of all classes.

The point of exposing this structure of Python objects is that you may change the way classes are built. As you know, type is an object, so it can be subclassed just like any other class. Once you get a subclass of type you need to instruct your class to use it as the metaclass instead of type, and you can do this by passing it as the metaclass keyword argument in the class definition.

``` python

class MyType(type): ... pass ... class MySpecialClass(metaclass=MyType): ... pass ... msp = MySpecialClass() type(msp) type(MySpecialClass) type(MyType) ```

Metaclasses 2: Singleton Day

Metaclasses are a very advanced topic in Python, but they have many practical uses. For example, by means of a custom metaclass you may log any time a class is instanced, which can be important for applications that shall keep a low memory usage or have to monitor it.

I am going to show here a very simple example of metaclass, the Singleton. Singleton is a well known design pattern, and many description of it may be found on the Internet. It has also been heavily criticized mostly because its bad behaviour when subclassed, but here I do not want to introduce it for its technological value, but for its simplicity (so please do not question the choice, it is just an example).

Singleton has one purpose: to return the same instance every time it is instanced, like a sort of object-oriented global variable. So we need to build a class that does not work like standard classes, which return a new instance every time they are called.

"Build a class"? This is a task for metaclasses. The following implementation comes from Python 3 Patterns, Recipes and Idioms.

``` python class Singleton(type):

instance = None
def __call__(cls, *args, **kw):
    if not cls.instance:
         cls.instance = super(Singleton, cls).__call__(*args, **kw)
    return cls.instance


We are defining a new type, which inherits from type to provide all bells and whistles of Python classes. We override the __call__ method, that is a special method invoked when we call the class, i.e. when we instance it. The new method wraps the original method of type by calling it only when the instance attribute is not set, i.e. the first time the class is instanced, otherwise it just returns the recorded instance. As you can see this is a very basic cache class, the only trick is that it is applied to the creation of instances.

To test the new type we need to define a new class that uses it as its metaclass

``` python

class ASingleton(metaclass=Singleton): ... pass ... a = ASingleton() b = ASingleton() a is b True hex(id(a)) '0xb68030ec' hex(id(b)) '0xb68030ec' ```

By using the is operator we test that the two objects are the very same structure in memory, that is their ids are the same, as explicitly shown. What actually happens is that when you issue a = ASingleton() the ASingleton class runs its __call__() method, which is taken from the Singleton type behind the class. That method recognizes that no instance has been created (Singleton.instance is None) and acts just like any standard class does. When you issue b = ASingleton() the very same things happen, but since Singleton.instance is now different from None its value (the previous instance) is directly returned.

Metaclasses are a very powerful programming tool and leveraging them you can achieve very complex behaviours with a small effort. Their use is a must every time you are actually metaprogramming, that is you are writing code that has to drive the way your code works. Good examples are creational patterns (injecting custom class attributes depending on some configuration), testing, debugging, and performance monitoring.

Coming to Instance

Before introducing you to a very smart use of metaclasses by talking about Abstract Base Classes (read: to save some topics for the next part of this series), I want to dive into the object creation procedure in Python, that is what happens when you instance a class. In the first post this procedure was described only partially, by looking at the __init_() method.

In the first post I recalled the object-oriented concept of constructor, which is a special method of the class that is automatically called when the instance is created. The class may also define a destructor, which is called when the object is destroyed. In languages without a garbage collection mechanism such as C++ the destructor shall be carefully designed. In Python the destructor may be defined through the __del__() method, but it is hardly used.

The constructor mechanism in Python is on the contrary very important, and it is implemented by two methods, instead of just one: __new__() and __init__(). The tasks of the two methods are very clear and distinct: __new__() shall perform actions needed when creating a new instance while __init__ deals with object initialization.

Since in Python you do not need to declare attributes due to its dynamic nature, __new__() is rarely defined by programmers, who may rely on __init__ to perform the majority of the usual tasks. Typical uses of __new__() are very similar to those listed in the previous section, since it allows to trigger some code whenever your class is instanced.

The standard way to override __new__() is

``` python class MyClass():

def __new__(cls, *args, **kwds):
    obj = super().__new__(cls, *args, **kwds)
    [put your code here]
    return obj


just like you usually do with __init__(). When your class inherits from object you do not need to call the parent method (object.__init__()), because it is empty, but you need to do it when overriding __new__.

Remember that __new__() is not forced to return an instance of the class in which it is defined, even if you shall have very good reasons to break this behaviour. Anyway, __init__() will be called only if you return an instance of the container class. Please also note that __new__(), unlike __init__(), accepts the class as its first parameter. The name is not important in Python, and you can also call it self, but it is worth using cls to remember that it is not an instance.

Movie Trivia

Section titles come from the following movies: The Blues Brothers (1980), The Muppets Take Manhattan (1984), Terminator 2: Judgement Day (1991), Coming to America (1988).


You will find a lot of documentation in this Reddit post. Most of the information contained in this series come from those sources.


Feel free to use the blog Google+ page to comment the post. The GitHub issues page is the best place to submit corrections.


Ian Ozsvald: Slides for High Performance Python tutorial at EuroSciPy2014 + Book signing!

 ∗ Planet Python

Yesterday I taught an excerpt of my 2 day High Performance Python tutorial as a 1.5 hour hands-on lesson at EuroSciPy 2014 in Cambridge with 70 students:


We covered profiling (down to line-by-line CPU & memory usage), Cython (pure-py and OpenMP with numpy), Pythran, PyPy and Numba. This is an abridged set of slides from my 2 day tutorial, take a look at those details for the upcoming courses (including an intro to data science) we’re running in October.

I’ll add the video in here once it is released, the slides are below.

I also got to do a book-signing for our High Performance Python book (co-authored with Micha Gorelick), O’Reilly sent us 20 galley copies to give away. The finished printed book will be available via O’Reilly and Amazon in the next few weeks.

Book signing at EuroSciPy 2014

If you want to hear about our future courses then join our low-volume training announce list. I have a short (no-signup) survey about training needs for Pythonistas in data science, please fill that in to help me figure out what we should be teaching.

I also have a further survey on how companies are using (or not using!) data science, I’ll be using the results of this when I keynote at PyConIreland in October, your input will be very useful.

Here are the slides (License: CC By NonCommercial), there’s also source on github:

Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.



Graham Dumpleton: Setting LANG and LC_ALL when using mod_wsgi.

 ∗ Planet Python

So I am at DjangoCon US 2014 and one of the first pain points for using mod_wsgi that came up in discussion at DjangoCon US was the lang and locale settings. These settings influence what the default encoding is for Python when implicitly converting Unicode to byte strings. In other words, they dictate what is going on at the Unicode/bytes boundary. Now this should not really be an issue with

Graham Dumpleton: Reporting on the DjangoCon US 2014 hallway track.

 ∗ Planet Python

I have only been in Portland for a few hours for DjangoCon, and despite some lack of sleep, I already feel that being here is recharging my enthusiasm for working on Open Source, something that has still been sagging a bit lately. I don't wish to return to that dark abyss I was in, so definitely what I need. Now lots of people write up reports on conferences including live noting them, but I

Carl Trachte: Internet Explorer 9 Save Dialog - SendKeys Last Resort

 ∗ Planet Python

At work we use Internet Explorer 9 on Windows 7 Enterprise.  SharePoint is the favored software for filesharing inside organizational groups.  Our mine planning office is in the States; the mine operation whose data I work is in a remote, poorly connected location of the world.

Recently Sharepoint was updated to a new version at the mine.  The SharePoint server configuration there no longer allows Windows Explorer view or mapping of the site to a Windows drive letter.  I've put in a trouble ticket to regain this functionality, but that may take a while if it's possible.  Without it it is difficult to automate file retrieval or get more than one file at a time.

In the meantime I've been able to get the text based files over using win32com automation in Python to run Internet Explorer and grab the innerHTML object.  innerHTML is essentially the text of the files with tags around it.  I rip out the tags, write the text to a file on my harddrive and I'm good to go.

Binary files proved to be more difficult to download.  Shown below is a screenshot of the Internet Explorer 9 dialog box that goes by the generic name Notification Bar:

I googled and could nowhere find how this thing fit into the Internet Explorer 9 Document object hierarchy.  Then I came upon this colorful exchange between Microsoft Certified MVP's from 2012 that made things a little more clear.
It turns out you can't access the Notification Bar programatically per se.  What you can do is activate the specific Internet Explorer window and tab you're interested in, then send keystrokes to get where you want to, click, and download your file.
I'm not a web programmer nor am I a dedicated Windows programmer (I'm actually a geologist).  IEC is a small module that wraps some useful functionality - in my case identifying and clicking on the link on the SharePoint page by it's text identifier:
# C Python 2.7
# Internet Explorer module.
import IEC as iec
import time
ie = iec.IEController()
ie.Navigate(<URL of SharePoint page>)
# Give the page time to load (7 seconds).
# I want to download file 11.msr.
# Give 5 seconds for the Notification Bar to show up.
I'm fortunate in that our mine planning vendor, MineSight, ships Python 2.7 and associated win32com packages along with their software (their API's are written for Python).  If you don't have win32com and friends installed, they are necessary for this solution.
At this point I've just got to deal with that pesky Internet Explorer 9 Notification Bar.  As it turns out, SendKeys makes it doable (although neither elegant nor robust :-(   ):
# Activate the SharePoint page.
from win32com.client import Dispatch as dispx
shell = dispx('WScript.Shell')
shell.AppActivate(<name of IE9 tab>)
# Little pause.
# Keyboard combination for the Notification Bar selection
# is ALT-N or '%n'
shell.SendKeys('%n', True)
# The Notification Bar goes to "Open" by default.
# You need to tab over to the "Save" button.
# Another little pause.
# Space bar clicks on this control.
shell.SendKeys(' ', True)
The key combinations for accessing the Notification Bar are in Microsoft's documentation here
One link showing use of SendKeys is a German site (mostly English text) here.
And that's pretty much it.  There's another dialog that pops up in Internet Explorer 9 after the file is downloaded.  I've been able to blow that off so far and it hasn't gotten in the way as I move to the next download.  I give these files (about 300 kb) 15 seconds to download over a slow connection.  I may have to adjust that.
This solution is an abomination by any coding/architecture/durability standard.  Still, it's the abomination that is getting the job done for the time being.
Thanks for stopping by.

Sunday, 31 August


Varun Nischal: code4Py | Style Context Differences

 ∗ Planet Python

As per recently created page, the following diff command output representing context differences, needed to be styled; [vagrant@localhost python]$ diff -c A B *** A 2014-08-20 20:13:30.315009258 +0000 --- B 2014-08-20 20:13:39.021009349 +0000 *************** *** 1,6 **** --- 1,9 ---- + typeset -i sum=0 + while read num do printf "%d " ${num} + sum=sum+${num} done … Continue reading



Europython: EuroPython 2014 Feedback Form

 ∗ Planet Python

EuroPython 2014 was a great event and we’d like to learn from you how to make EuroPython 2015 even better. If you attended EuroPython 2014, please take a few moments and fill out our feedback form:

EuroPython 2014 Feedback Form

We will leave the feedback form online for another two weeks and then use the information as basis for the work on EuroPython 2015 and also post a summary of the multiple choice questions (not the comments to protect your privacy) on our website. Many thanks in advance.

Helping with EuroPython 2015

If you would like to help with EuroPython 2015, we invite you to join the EuroPython Society. Membership is free. Just go to our application page and enter your details.

In the coming months, we will start the discussions about the new work group model we’ve announced at the conference.


EuroPython Society

EuroPython Society: EuroPython 2014 Feedback Form

 ∗ Planet Python

EuroPython 2014 was a great event and we’d like to learn from you how to make EuroPython 2015 even better. If you attended EuroPython 2014, please take a few moments and fill out our feedback form:

EuroPython 2014 Feedback Form

We will leave the feedback form online for another two weeks and then use the information as basis for the work on EuroPython 2015 and also post a summary of the multiple choice questions (not the comments to protect your privacy) on our website. Many thanks in advance.

Helping with EuroPython 2015

If you would like to help with EuroPython 2015, we invite you to join the EuroPython Society. Membership is free. Just go to our application page and enter your details.

In the coming months, we will start the discussions about the new work group model we’ve announced at the conference.


EuroPython Society

Ian Ozsvald: Python Training courses: Data Science and High Performance Python coming in October

 ∗ Planet Python

I’m pleased to say that via our ModelInsight we’ll be running two Python-focused training courses in October. The goal is to give you new strong research & development skills, they’re aimed at folks in companies but would suit folks in academia too. UPDATE training courses ready to buy (1 Day Data Science, 2 Day High Performance).

UPDATE we have a <5min anonymous survey which helps us learn your needs for Data Science training in London, please click through and answer the few questions so we know what training you need.

“Highly recommended – I attended in Aalborg in May “:… upcoming Python DataSci/HighPerf training courses”” @ThomasArildsen

These and future courses will be announced on our London Python Data Science Training mailing list, sign-up for occasional announces about our upcoming courses (no spam, just occasional updates, you can unsubscribe at any time).

Intro to Data science with Python (1 day) on Friday 24th October

Students: Basic to Intermediate Pythonistas (you can already write scripts and you have some basic matrix experience)

Goal: Solve a complete data science problem (building a working and deployable recommendation engine) by working through the entire process – using numpy and pandas, applying test driven development, visualising the problem, deploying a tiny web application that serves the results (great for when you’re back with your team!)

  • Learn basic numpy, pandas and data cleaning
  • Be confident with Test Driven Development and debugging strategies
  • Create a recommender system and understand its strengths and limitations
  • Use a Flask API to serve results
  • Learn Anaconda and conda environments
  • Take home a working recommender system that you can confidently customise to your data
  • £300 including lunch, central London (24th October)
  • Additional announces will come via our London Python Data Science Training mailing list
  • Buy your ticket here

High Performance Python (2 day) on Thursday+Friday 30th+31st October

Students: Intermediate Pythonistas (you need higher performance for your Python code)

Goal: learn high performance techniques for performant computing, a mix of background theory and lots of hands-on pragmatic exercises

  • Profiling (CPU, RAM) to understand bottlenecks
  • Compilers and JITs (Cython, Numba, Pythran, PyPy) to pragmatically run code faster
  • Learn r&d and engineering approaches to efficient development
  • Multicore and clusters (multiprocessing, IPython parallel) for scaling
  • Debugging strategies, numpy techniques, lowering memory usage, storage engines
  • Learn Anaconda and conda environments
  • Take home years of hard-won experience so you can develop performant Python code
  • Cost: £600 including lunch, central London (30th & 31st October)
  • Additional announces will come via our London Python Data Science Training mailing list
  • Buy your ticket here

The High Performance course is built off of many years teaching and talking at conferences (including PyDataLondon 2013, PyCon 2013, EuroSciPy 2012) and in companies along with my High Performance Python book (O’Reilly). The data science course is built off of techniques we’ve used over the last few years to help clients solve data science problems. Both courses are very pragmatic, hands-on and will leave you with new skills that have been battle-tested by us (we use these approaches to quickly deliver correct and valuable data science solutions for our clients via ModelInsight). At PyCon 2012 my students rated me 4.64/5.0 for overall happiness with my High Performance teaching.

@ianozsvald [..] Best tutorial of the 4 I attended was yours. Thanks for your time and preparation!” @cgoering

We’d also like to know which other courses you’d like to learn, we can partner with trainers as needed to deliver new courses in London. We’re focused around Python, data science, high performance and pragmatic engineering. Drop me an email (via ModelInsight) and let me know if we can help.

Do please join our London Python Data Science Training mailing list to be kept informed about upcoming training courses.

Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.


Graeme Cross: Notes from MPUG, August 2014

 ∗ Planet Python

A strong line-up of speakers and the post-PyCon AU buzz meant that we had about 35 people at the August Melbourne Python Users Group meeting. There were five presentations for the night and here are my random notes.

1. Highlights from PyCon AU 2014

Various people contributed their highlights of PyCon AU 2014, which was in Brisbane in early August (and the reason why the MPUG meeting was moved back a week). The conference videos are now up on the Youtube channel.

2. Javier Candeira: “Don’t monkeypatch None!”

Javier was inspired by discussion at PyCon AU about PEP 336 (Make None Callable) and decided to implement it, with some inspiration from previous PyCon AU talks by Ryan Kelly, Richard Jones (“Don’t do this!”), Nick Coghlan (“Elegant and ugly hacks in CPython”) and Larry Hastings.

He has implemented quiet_None, an analog to quiet_NaN. He walked us through the prototype code, with a deviation into Haskell’s Maybe monad along the way.

A few random notes:

  • Uses the ContextDecorator class
  • Module: “forbiddenfruit” and “curse” function
  • Won’t work on stackless or IronPython (as they have no working implementation of sys._getframe()
  • Problems with exceptions, so the code needs to monkeypatch Exception!

3. Juan Nunez-Iglesias: SciPy 2014, a summary

Juan works at VLSCI and is a scikit-image contributor

  • Great quote: Greg Wilson (keynote) on Software Carpentry: “we save researchers a day a week for the rest of their careers”
  • Lots of core devs from biggest names in Python scientific stack
  • Excellent tutorials; massive geoscience track, massive education track
  • EuroSciPy seems to be more academic
  • PyCon AU – very web development focus

SciPy 2014 highlights:

  • Michael Droettboom – Airspeed Velocity: code benchmarking tool with web interface and plots
  • Paul Ivanov – IPython vimception: vim bindings for IPython notebook
  • Damian Avila – RISE: live presentations from IPython notebooks
  • MinRK (from core IPython team) – IPython Parallel tutorial
  • Brian Granger & ????
  • Aaron Meurer: Conda – good package manager for C extensions as well as Python packages
  • Matt Rocklin, Mark Wiebe: Blaze – abstracting away tabular data

Recommended lightning talks:

  • Project Jupyter
  • Waffles

On Juan’s “to watch” list:

  • Chris Fonnesbeck PyMC tutorial (see Jake Vanderplas’s Bayesian vs frequentist posts)
  • David Sanders: Julia tutorial
  • Reproducible science tutorial (lots of buzz on this tutorial)
  • Geoscience and education tracks
  • Diversity: 1.5% women in 2013, 15% in 2014 – indicative of a big problem in the industry

4. Rory Hart: Microservices in Python

This was a repeat of Rory’s PyCon AU 2014 talk (here’s the video).

5. Nick Farrell: sux

Nick gave a repeat of his PyCon AU 2014 lightning talk on the sux library, which allows you to transparently use Python 2 packages in Python 3.


Dave Behnke: Fun with SQLAlchemy

 ∗ Planet Python

This is a little experiment I created with SQLAlchemy. In this notebook, I'm using sqlite to create a table, and doing some operations such as deleting all the rows in the table and inserting a list of items.

In [2]:
# connection is a connection to the database from a pool of connections
connection = engine.connect()
# meta will be used to reflect the table later
meta = MetaData()
2014-08-10 21:10:16,410 INFO sqlalchemy.engine.base.Engine SELECT CAST(&apostest plain returns&apos AS VARCHAR(60)) AS anon_1

INFO:sqlalchemy.engine.base.Engine:SELECT CAST(&apostest plain returns&apos AS VARCHAR(60)) AS anon_1

2014-08-10 21:10:16,411 INFO sqlalchemy.engine.base.Engine ()


2014-08-10 21:10:16,413 INFO sqlalchemy.engine.base.Engine SELECT CAST(&apostest unicode returns&apos AS VARCHAR(60)) AS anon_1

INFO:sqlalchemy.engine.base.Engine:SELECT CAST(&apostest unicode returns&apos AS VARCHAR(60)) AS anon_1

2014-08-10 21:10:16,414 INFO sqlalchemy.engine.base.Engine ()


In [3]:
# create the table if it doesn't exist already
connection.execute("create table if not exists test ( id integer primary key autoincrement, name text )")
2014-08-10 21:10:17,212 INFO sqlalchemy.engine.base.Engine create table if not exists test ( id integer primary key autoincrement, name text )

INFO:sqlalchemy.engine.base.Engine:create table if not exists test ( id integer primary key autoincrement, name text )

2014-08-10 21:10:17,214 INFO sqlalchemy.engine.base.Engine ()


2014-08-10 21:10:17,217 INFO sqlalchemy.engine.base.Engine COMMIT


<sqlalchemy.engine.result.ResultProxy at 0x105083e48>
In [4]:
#reflects all the tables in the current connection
2014-08-10 21:10:17,986 INFO sqlalchemy.engine.base.Engine SELECT name FROM  (SELECT * FROM sqlite_master UNION ALL   SELECT * FROM sqlite_temp_master) WHERE type=&apostable&apos ORDER BY name

INFO:sqlalchemy.engine.base.Engine:SELECT name FROM  (SELECT * FROM sqlite_master UNION ALL   SELECT * FROM sqlite_temp_master) WHERE type=&apostable&apos ORDER BY name

2014-08-10 21:10:17,987 INFO sqlalchemy.engine.base.Engine ()


2014-08-10 21:10:17,989 INFO sqlalchemy.engine.base.Engine PRAGMA table_info("sqlite_sequence")

INFO:sqlalchemy.engine.base.Engine:PRAGMA table_info("sqlite_sequence")

2014-08-10 21:10:17,990 INFO sqlalchemy.engine.base.Engine ()


2014-08-10 21:10:17,993 INFO sqlalchemy.engine.base.Engine PRAGMA foreign_key_list("sqlite_sequence")

INFO:sqlalchemy.engine.base.Engine:PRAGMA foreign_key_list("sqlite_sequence")

2014-08-10 21:10:17,995 INFO sqlalchemy.engine.base.Engine ()


2014-08-10 21:10:17,997 INFO sqlalchemy.engine.base.Engine PRAGMA index_list("sqlite_sequence")

INFO:sqlalchemy.engine.base.Engine:PRAGMA index_list("sqlite_sequence")

2014-08-10 21:10:17,997 INFO sqlalchemy.engine.base.Engine ()


2014-08-10 21:10:17,999 INFO sqlalchemy.engine.base.Engine PRAGMA table_info("test")

INFO:sqlalchemy.engine.base.Engine:PRAGMA table_info("test")

2014-08-10 21:10:18,000 INFO sqlalchemy.engine.base.Engine ()


2014-08-10 21:10:18,001 INFO sqlalchemy.engine.base.Engine PRAGMA foreign_key_list("test")

INFO:sqlalchemy.engine.base.Engine:PRAGMA foreign_key_list("test")

2014-08-10 21:10:18,002 INFO sqlalchemy.engine.base.Engine ()


2014-08-10 21:10:18,004 INFO sqlalchemy.engine.base.Engine PRAGMA index_list("test")

INFO:sqlalchemy.engine.base.Engine:PRAGMA index_list("test")

2014-08-10 21:10:18,005 INFO sqlalchemy.engine.base.Engine ()


In [5]:
# grabs a Table object from meta
test = meta.tables['test']
Table(&apostest&apos, MetaData(bind=None), Column(&aposid&apos, INTEGER(), table=<test>, primary_key=True, nullable=False), Column(&aposname&apos, TEXT(), table=<test>), schema=None)
In [6]:
# cleans out all the rows in the test table
result = connection.execute(test.delete())
print("Deleted %d row(s)" % result.rowcount)
2014-08-10 21:10:22,659 INFO sqlalchemy.engine.base.Engine DELETE FROM test

INFO:sqlalchemy.engine.base.Engine:DELETE FROM test

2014-08-10 21:10:22,661 INFO sqlalchemy.engine.base.Engine ()


2014-08-10 21:10:22,662 INFO sqlalchemy.engine.base.Engine COMMIT


Deleted 11 row(s)

In [7]:
# create a list of names to be inserting into the test table
names = ['alpha', 'bravo', 'charlie', 'delta', 'epsilon', 'foxtrot', 'golf', 'hotel', 'india', 'juliet', 'lima']
In [8]:
# perform multiple inserts, the list is converted on the fly into a dictionary with the name field.
result = connection.execute(test.insert(), [{'name': name} for name in names])
print("Inserted %d row(s)" % result.rowcount)
2014-08-10 21:10:26,580 INFO sqlalchemy.engine.base.Engine INSERT INTO test (name) VALUES (?)

INFO:sqlalchemy.engine.base.Engine:INSERT INTO test (name) VALUES (?)

2014-08-10 21:10:26,582 INFO sqlalchemy.engine.base.Engine ((&aposalpha&apos,), (&aposbravo&apos,), (&aposcharlie&apos,), (&aposdelta&apos,), (&aposepsilon&apos,), (&aposfoxtrot&apos,), (&aposgolf&apos,), (&aposhotel&apos,)  ... displaying 10 of 11 total bound parameter sets ...  (&aposjuliet&apos,), (&aposlima&apos,))

INFO:sqlalchemy.engine.base.Engine:((&aposalpha&apos,), (&aposbravo&apos,), (&aposcharlie&apos,), (&aposdelta&apos,), (&aposepsilon&apos,), (&aposfoxtrot&apos,), (&aposgolf&apos,), (&aposhotel&apos,)  ... displaying 10 of 11 total bound parameter sets ...  (&aposjuliet&apos,), (&aposlima&apos,))

2014-08-10 21:10:26,583 INFO sqlalchemy.engine.base.Engine COMMIT


Inserted 11 row(s)

In [9]:
# query the rows with select, the where clause is included for demostration
# it can be omitted
result = connection.execute(select([test]).where( > 0)) 
2014-08-10 21:10:28,528 INFO sqlalchemy.engine.base.Engine SELECT, 
FROM test 

FROM test 

2014-08-10 21:10:28,529 INFO sqlalchemy.engine.base.Engine (0,)


In [10]:
# show the results
for row in result:
    print("id=%d, name=%s" % (row['id'], row['name']))
id=56, name=alpha
id=57, name=bravo
id=58, name=charlie
id=59, name=delta
id=60, name=epsilon
id=61, name=foxtrot
id=62, name=golf
id=63, name=hotel
id=64, name=india
id=65, name=juliet
id=66, name=lima

In []:


Dave Behnke: Back to Python

 ∗ Planet Python

After some "soul searching" and investigation between go and python over the last few months. I've decided to come back to Python.

My Experience

I spent a couple months researching and developing with Go. I even bought a pre-released book (Go in Action - The concurrency chapter wasn't written quite yet so I ended up looking elsewhere. I eventually found an excellent book explaining concurrency concepts of Go through my safari account ( The book is entitled Mastering Concurrency in Go. (

After going through a couple of programming exercises using Go, I started to think to myself, how would I do this in Python. It started to click in my brain that on a conceptional level a goroutine is similar to a async coroutine in Python. The main difference is that Go was designed from the beginning to be concurrent. Python it requires a little more work.

I'll make a long story short. Go is a good language, I will probably use it for specific problems to solve. I'm more familiar with Python. The passionate Python community makes me proud to be a part of it. I like having access to many interesting modules and packages. Ipython, Flask, Djano, Sqlalchemy just to name a few.

I look forward to continuing to work with Python and share code examples where I can. Stay tuned!

Saturday, 30 August



Alec Munro: It's yer data! - how Google secured its future, and everyone else's

 ∗ Planet Python

Dear Google,

This is a love letter and a call to action.

I believe we stand at a place where there is a unique opportunity in managing personal data.

There is a limited range of data types in the universe, and practically speaking, the vast majority of software works with a particularly tiny fraction of them.

People, for example. We know things about them.

Names, pictures of, people known, statements made, etc.

Tons of web applications conceive of these objects. Maybe not all, but probably most have some crossover. For many of the most trafficked apps, this personal data represents a very central currency. But unfortunately, up until now we've more or less been content with each app having it's own currency, that is not recognized elsewhere.

You can change that. You can establish a central, independent bank of data, owned by users and lent to applications in exchange for functionality. The format of the data itself will be defined and evolved by an independent agency of some sort.

There are two core things this will accomplish.

1) It will open up a whole new world of application development free from ties to you, Facebook, Twitter, etc.

2) It will give people back ownership of their data. They will be able to establish and evolve an online identity that carries forward as they change what applications they use.

Both of these have a dramatic impact on Google, as they allow you to do what you do best, building applications that work with large datasets, while at the same time freeing from you concerns that you are monopolizing people's data.

A new application world

When developing a new application, you start with an idea, and then you spend a lot of time defining a data model and the logic required to implement that idea on that data model. If you have any success with your application, you will need to invest further in your data model, fleshing it out, and implementing search, caching, and other optimizations.

In this new world, all you would do is include a library and point it at an existing data model. For the small fraction of data that was unique to your application, you could extend the existing model. For example:
from new_world import Model, Field

BaseUser = Model("")

class OurUser(BaseUser):
our_field = Field("our_field", type=String)

That's it. No persistence (though you could set args somewhere to define how to synchronize), no search, no caching. Now you can get to actually building what makes your application great.

Conceivably, you can do it all in Javascript, other than identifying the application uniquely to the data store.

And you can be guaranteed data interoperability with Facebook, Google, etc. So if you make a photo editing app, you can edit photos uploaded with any of those, and they can display the photos that are edited.

Securing our future

People have good reason to be suspicious of Google, Facebook, or any other organization that is able to derive value through the "ownership" of their data. Regardless of the intent of the organization today, history has shown that profit is a very powerful motivator for bad behaviour, and these caches of personal data represent a store of potential profit that we all expect will at some point prove too tempting to avoid abusing.

Providing explicit ownership and license of said data via a third-party won't take away the temptation to abuse the data, but will make it more difficult in a number of ways:

  • Clear ownership will make unfair use claims much more cut-and-dried
  • A common data format will make it much easier to abandon rogue applications
  • Reduced application development overhead will increase the competitive pressure, lowering the chance of a single application monopolizing a market and needing to grow through exploitation of its users data

A gooder, more-productive, Google

By putting people's data back in their hands, and merely borrowing it from them for specific applications, the opportunities for evil are dramatically reduced.

But what I think is even more compelling for Google here is that it will make you more productive. Internally, I believe you already operate similar to how I've described here, but you constantly bump up against limitations imposed by trying not to be evil. Without having to worry about the perceptions of how you are using people's data, what could you accomplish?


Google wants to do no evil. Facebook is perhaps less explicit, but from what I know of its culture, I believe it aspires to be competent enough that there's no need to exploit users data. The future will bring new leadership and changes in culture to both companies, but if they act soon, they can secure their moral aspirations and provide a great gift to the world.

(Interesting aside, Amazon's recently announced Cognito appears to be in some ways a relative of this idea, at least as a developer looking to build things. Check it out.)

Friday, 29 August


The new Go qml OpenGL API

 ∗ Labix Blog

As detailed in the preliminary release of qml.v1 for Go a couple of weeks ago, my next task was to finish the improvements in its OpenGL API. Good progress has happened since then, and the new API is mostly done and available for experimentation. At the same time, there’s still work to do on polishing edges and on documenting the extensive API. This blog post aims to present the improvements made, their internal design, and also to invite help for finishing the pending details.

Before diving into the new, let’s first have a quick look at how a Go application using OpenGL might look like with qml.v0. This is an excerpt from the old painting example:

import (

func (r *GoRect) Paint(p *qml.Painter) {
        width := gl.Float(r.Int("width"))
        height := gl.Float(r.Int("height"))
        // ...

The example imports both the qml and the gl packages, and then defines a Paint method that makes use of the GL types, functions, and constants from the gl package. It looks quite reasonable, but there are a few relevant shortcomings.

One major issue in the current API is that it offers no means to tell even at a basic level what version of the OpenGL API is being coded against, because the available functions and constants are the complete set extracted from the gl.h header. For example, OpenGL 2.0 has GL_ALPHA and GL_ALPHA4/8/12/16 constants, but OpenGL ES 2.0 has only GL_ALPHA. This simplistic choice was a good start, but comes with a number of undesired side effects:

  • Many trivial errors that should be compile errors fail at runtime instead
  • When the code does work, the developer is not sure about which API version it is targeting
  • Symbols for unsupported API versions may not be available for linking, even if unused

That last point also provides a hint of another general issue: portability. Every system has particularities for how to load the proper OpenGL API entry points. For example, which libraries should be linked with, where they are in the local system, which entry points they support, etc.

So this is the stage for the improvements that are happening. Before detailing the solution, let’s have a look at the new painting example in qml.v1, that makes use of the improved API:

import (

func (r *GoRect) Paint(p *qml.Painter) {
        gl := GL.API(p)
        width := float32(r.Int("width"))
        height := float32(r.Int("height"))
        // ...

With the new API, rather than importing a generic gl package, a version-specific gl/2.0 package is imported under the name GL. That choice of package name allows preserving familiar OpenGL terms for both the functions and the constants (gl.Enable and GL.BLEND, for example). Inside the new Paint method, the gl value obtained from GL.API holds only the functions that are defined for the specific OpenGL API version imported, and the constants in the GL package are also constrained to those available in the given version. Any improper references become build time errors.

To support all the various OpenGL versions and profiles, there are 23 independent packages right now. These packages are of course not being hand-built. Instead, they are generated all at once by a tool that gathers information from various sources. The process can be tersely described as:

  1. A ragel-based parser processes Qt’s qopenglfunctions_*.h header files to collect version-specific functions
  2. The Khronos OpenGL Registry XML is parsed to collect version-specific constants
  3. A number of tweaks defined in the tool’s code is applied to the state
  4. Packages are generated by feeding the state to text templates

Version-specific functions might also be extracted from the Khronos Registry, but there’s a good reason to use information from the Qt headers instead: Qt already solved the portability issue. It works in several platforms, and if somebody is using QML successfully, it means Qt is already using that system’s OpenGL capabilities. So rather than designing a new mechanism to solve the same problem, the qml package now leverages Qt for resolving all the GL function entry points and the linking against available libraries.

Going back to the example, it also demonstrates another improvement that comes with the new API: plain types that do not carry further meaning such as gl.Float and gl.Int were replaced by their native counterparts, float32 and int32. Richer types such as Enum were preserved, and as suggested by David Crawshaw some new types were also introduced to represent entities such as programs, shaders, and buffers. The custom types are all available under the common gl/glbase package that all version-specific packages make use of.

So this is all working and available for experimentation right now. What is left to do is almost exclusively improving the list of function tweaks with two goals in mind, which will be highlighted below as those are areas where help would be appreciated, mainly due to the footprint of the API.

Documentation importing

There are a few hundred functions to document, but a large number of these are variations of the same function. The previous approach was to simply link to the upstream documentation, but it would be much better to have polished documentation attached to the functions themselves. This is the new documentation for MultMatrixd, for example. For now the documentation is being imported manually, but the final process will likely consist of some automation and some manual polishing.

Function polishing

The standard C OpenGL API can often be translated automatically (see BindBuffer or BlendColor), but in other cases the function prototype has to be tweaked to become friendly to Go. The translation tool already has good support for defining most of these tweaks independently from the rest of the machinery. For example, the following logic changes the ShaderSource function from its standard from into something convenient in Go:

name: "ShaderSource",
params: paramTweaks{
        "glstring": {rename: "source", retype: "...string"},
        "length":   {remove: true},
        "count":    {remove: true},
before: `
        count := len(source)
        length := make([]int32, count)
        glstring := make([]unsafe.Pointer, count)
        for i, src := range source {
                length[i] = int32(len(src))
                if len(src) > 0 {
                        glstring[i] = *(*unsafe.Pointer)(unsafe.Pointer(&src))
                } else {
                        glstring[i] = unsafe.Pointer(uintptr(0))

Other cases may be much simpler. The MultMatrixd tweak, for instance, simply ensures that the parameter has the proper length, and injects the documentation:

name: "MultMatrixd",
before: `
        if len(m) != 16 {
                panic("parameter m must have length 16 for the 4x4 matrix")
doc: `
        multiplies the current matrix with the provided matrix.

and as an even simpler example, CreateProgram is tweaked so that it returns a glbase.Program instead of the default uint32.

name:   "CreateProgram",
result: "glbase.Program",

That kind of polishing is where contributions would be most appreciated right now. One valid way of doing this is picking a range of functions and importing and polishing their documentation manually, and while doing that keeping an eye on required tweaks that should be performed on the function based on its documentation and prototype.

If you’d like to help somehow, or just ask questions or report your experience with the new API, please join us in the project mailing list.


False Positive or Not? Difficult to Analyze Javascript, (Fri, Aug 29th)

 ∗ SANS Internet Storm Center, InfoCON: green

Our reader Travis sent us the following message:


Video: How to Design for Cross Device Use

 ∗ LukeW | Digital Product Design + Strategy

Today, people's online lives aren't limited to one device with one screen. Instead they use multiple devices both in sequence and at the same time. To account for this new reality, we need to think about cross device interactions and how our products work in a multi-device world. In this quick 4 minute video, I discuss how.

This video is part of my User Experience How-To series sponsored by the Intel Developer Zone.

Rachel Andrew on the Business of Web Dev: Getting to the Action

 ∗ A List Apart: The Full Feed

Freelancers and self-employed business owners can choose from a huge number of conferences to attend in any given year. There are hundreds of industry podcasts, a constant stream of published books, and a never-ending supply of sites all giving advice. It is very easy to spend a lot of valuable time and money just attending, watching, reading, listening and hoping that somehow all of this good advice will take root and make our business a success.

However, all the good advice in the world won’t help you if you don’t act on it. While you might leave that expensive conference feeling great, did your attendance create a lasting change to your business? I was thinking about this subject while listening to episode 14 of the Working Out podcast, hosted by Ashley Baxter and Paddy Donnelly. They were talking about following through, and how it is possible to “nod along” to good advice but never do anything with it.

If you have ever been sent to a conference by an employer, you may have been expected to report back. You might even have been asked to present to your team on the takeaway points from the event. As freelancers and business owners, we don’t have anyone making us consolidate our thoughts in that way. It turns out that the way I work gives me a fairly good method of knowing which things are bringing me value.

Tracking actionable advice

I’m a fan of the Getting Things Done technique, and live by my to-do lists. I maintain a Someday/Maybe list in OmniFocus into which I add items that I want to do or at least investigate, but that aren’t a project yet.

If a podcast is worth keeping on my playlist, there will be items entered linking back to certain episodes. Conference takeaways might be a link to a site with information that I want to read. It might be an idea for an article to write, or instructions on something very practical such as setting up an analytics dashboard to better understand some data. The first indicator of a valuable conference is how many items I add during or just after the event.

Having a big list of things to do is all well and good, but it’s only one half of the story. The real value comes when I do the things on that list, and can see whether they were useful to my business. Once again, my GTD lists can be mined for that information.

When tickets go on sale for that conference again, do I have most of those to-do items still sat in Someday/Maybe? Is that because, while they sounded like good ideas, they weren’t all that relevant? Or, have I written a number of blog posts or had several articles published on themes that I started considering off the back of that conference? Did I create that dashboard, and find it useful every day? Did that speaker I was introduced to go on to become a friend or mentor, or someone I’ve exchanged emails with to clarify a topic I’ve been thinking about?

By looking back over my lists and completed items, I can start to make decisions about the real value to my business and life of the things I attend, read, and listen to. I’m able to justify the ticket price, time, and travel costs by making that assessment. I can feel confident that I’m not spending time and money just to feel as if I’m moving forward, yet gaining nothing tangible to show for it.

A final thought on value

As entrepreneurs, we have to make sure we are spending our time and money on things that will give us the best return. All that said, it is important to make time in our schedules for those things that we just enjoy, and in particular those things that do motivate and inspire us. I don’t think that every book you read or event you attend needs to result in a to-do list of actionable items.

What we need as business owners, and as people, is balance. We need to be able to see that the things we are doing are moving our businesses forward, while also making time to be inspired and refreshed to get that actionable work done.


False Positive or Not? Difficult to Analyze Javascript, (Fri, Aug 29th)

 ∗ SANS Internet Storm Center, InfoCON: green

Our reader Travis sent us the following message:


Yann Larrivée: ConFoo is looking for speakers

 ∗ Planet Python

ConFoo is currently looking for web professionals with deep understanding of PHP, Java, Ruby, Python, DotNet, HTML5, Databases, Cloud Computing, Security and Mobile development to share their skills and experience at the next ConFoo. Submit your proposals between August 25th and September 22nd.

ConFoo is a conference for developers that has built a reputation as a prime destination for exploring new technologies, diving deeper into familiar topics, and experiencing the best of community and culture.

  • ConFoo 2015 will be hosted on February 18-20 in Montreal, at the Hilton Bonaventure Hotel.
  • We take good care of our speakers by covering most expenses including travel, accommodation, lunch, full conference ticket, etc.
  • Presentations are 35min + 10min for questions, and may be delivered in English or French.
  • ConFoo is an open environment where everyone is welcome to submit. We are simply looking for quality proposals by skilled and friendly people.

If you would simply prefer to attend the conference, we have a $290 discount until October 13th.

Thursday, 28 August


Martijn Faassen: Morepath 0.5(.1) and friends released!

 ∗ Planet Python

I've just released a whole slew things of things, the most important is Morepath 0.5, your friendly neighborhood Python web framework with superpowers!

What's new?

There are a a bunch of new things in the documentation, in particular:

Also available is @reg.classgeneric. This depends on a new feature in the Reg library.

There are a few bug fixes as well.

For more details, see the full changelog.

Morepath mailing list

I've documented how to get in touch with the Morepath community. In particular, there's a new Morepath mailing list!

Please do get in touch!

Other releases

I've also released:

  • Reg 0.8. This is the generic function library behind some of Morepath's flexibility and power.
  • BowerStatic 0.3. This is a WSGI framework for including static resources in HTML pages automatically, using components installed with Bower.
  • more.static 0.2. This is a little library integrating BowerStatic with Morepath.

Morepath videos!

You may have noticed I linked to Morepath 0.5.1 before, not Morepath 0.5. This is because I had to as I was using a new youtube extension that gave me a bit too much on readthedocs. I replaced that with raw HTML, which works better. The Morepath docs now include two videos.

  • On the homepage is my talk about Morepath at EuroPython 2014 in July. It's a relatively short talk, and gives a good idea on what makes Morepath different.
  • If you're interested in the genesis and history behind Morepath, and general ideas on what it means to be a creative developer, you can find another, longer, video on the Morepath history page. This was taken last year at PyCon DE, where I had the privilege to be invited to give a keynote speech.


Ian Ozsvald: High Performance Python Training at EuroSciPy this afternoon

 ∗ Planet Python

I’m training on High Performance Python this afternoon at EuroSciPy, my github source is here (as a shortlink: There are prerequisites for the course.

This training is actually a tiny part of what I’ll teach on my 2 day High Performance Python course in London in October (along with a Data Science course). If you’re at EuroSciPy, please say Hi :-)

Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.


Richard Jones: When testing goes bad

 ∗ Planet Python

I've recently started working on a large, mature code base (some 65,000 lines of Python code). It has 1048 unit tests implemented in the standard unittest.TestCase fashion using the mox framework for mocking support (I'm not surprised you've not heard of it).

Recently I fixed a bug which was causing a user interface panel to display when it shouldn't have been. The fix basically amounts to a couple of lines of code added to the panel in question:

+    def can_access(self, context):
+        # extend basic permission-based check with a check to see whether 
+        # the Aggregates extension is even enabled in nova 
+        if not nova.extension_supported('Aggregates', context['request']):
+            return False
+        return super(Aggregates, self).can_access(context)

When I ran the unit test suite I discovered to my horror that 498 of the 1048 tests now failed. The reason for this is that the can_access() method here is called as a side-effect of those 498 tests and the nova.extension_supported (which is a REST call under the hood) needed to be mocked correctly to support it being called.

I quickly discovered that given the size of the test suite, and the testing tools used, each of those 498 tests must be fixed by hand, one at a time (if I'm lucky, some of them can be knocked off two at a time).

The main cause is mox's mocking of callables like the one above which enforces the order that those callables are invoked. It also enforces that the calls are made at all (uncalled mocks are treated as test failures).

This means there is no possibility to provide a blanket mock for the "nova.extension_supported". Tests with existing calls to that API need careful attention to ensure the ordering is correct. Tests which don't result in the side- effect call to the above method will raise an error, so even adding a mock setup in a TestCase.setUp() doesn't work in most cases.

It doesn't help that the codebase is so large, and has been developed by so many people over years. Mocking isn't consistently implemented; even the basic structure of tests in TestCases is inconsistent.

It's worth noting that the ordering check that mox provides is never used as far as I can tell in this codebase. I haven't sighted an example of multiple calls to the same mocked API without the additional use of the mox InAnyOrder() modifier. mox does not provide a mechanism to turn the ordering check off completely.

The pretend library (my go-to for stubbing) splits out the mocking step and the verification of calls so the ordering will only be enforced if you deem it absolutely necessary.

The choice to use unittest-style TestCase classes makes managing fixtures much more difficult (it becomes a nightmare of classes and mixins and setUp() super() calls or alternatively a nightmare of mixing classes and multiple explicit setup calls in test bodies). This is exacerbated by the test suite in question introducing its own mock-generating decorator which will generate a mock, but again leaves the implementation of the mocking to the test cases. py.test's fixtures are a far superior mechanism for managing mocking fixtures, allowing far simpler centralisation of the mocks and overriding of them through fixture dependencies.

The result is that I spent some time working through some of the test suite and discovered that in an afternoon I could fix about 10% of the failing tests. I have decided that spending a week fixing the tests for my 5 line bug fix is just not worth it, and I've withdrawn the patch.


Montreal Python User Group: Montréal-Python 48: Incorrect jiujitsu - Call for speakers

 ∗ Planet Python

Pythonisthas from Montreal, it's time for us for our back to school special. We are coming back from our summer vacation and we are hosting our next meetup at the offices of our friends from Shopify on St-Laurent street on Tuesday September 23th at 6:30 pm.

We especially love to hear from new speakers. If you haven't given a talk at Montréal-Python before, a 5 or 10 minute lightning talk would be a great start, but we also have slots for 10 to 40 minutes talks!

It's a perfect opportunity if you would like to show us what you've discovered and created, especially if you are planning to present your talk at PyCon.

Don't forget, the call speakers for PyCon 2015 is ending on Sept, 15th

Some topic suggestions:

  • Give a beginner's introduction to a Python library you've been using!
  • Talk about a project you're working on!
  • Show us unit testing, continuous integration or Python documentation tools!
  • Tell us about a Python performance problem you've run into and how you solved it!
  • The standard Python library is full of amazing things. Have you learned how multiprocessing or threading or GUI programming works recently? Tell us about it!
  • Explain how to get started with Django in 5 minutes!

We're always looking out for 10 to 40 minutes talks, or a quick 5 minutes flash presentation. If you discovered or learned something that you find interesting, we'd love to help you let others learn about it! Send your proposals to



Wednesday, 27 August


Mike Driscoll: wxPython: Converting wx.DateTime to Python datetime

 ∗ Planet Python

The wxPython GUI toolkit includes its own date / time capabilities. Most of the time, you can just use Python’s datetime and time modules and you’ll be fine. But occasionally you’ll find yourself needing to convert from wxPython’s wx.DateTime objects to Python’s datetime objects. You may encounter this when you use the wx.DatePickerCtrl widget.

Fortunately, wxPython’s calendar module has some helper functions that can help you convert datetime objects back and forth between wxPython and Python. Let’s take a look:

def _pydate2wxdate(date):
     import datetime
     assert isinstance(date, (datetime.datetime,
     tt = date.timetuple()
     dmy = (tt[2], tt[1]-1, tt[0])
     return wx.DateTimeFromDMY(*dmy)
def _wxdate2pydate(date):
     import datetime
     assert isinstance(date, wx.DateTime)
     if date.IsValid():
          ymd = map(int, date.FormatISODate().split('-'))
          return None

You can use these handy functions in your own code to help with your conversions. I would probably put these into a controller or utilities script. I would also rewrite it slightly so I wouldn’t import Python’s datetime module inside the functions. Here’s an example:

import datetime
import wx
def pydate2wxdate(date):
     assert isinstance(date, (datetime.datetime,
     tt = date.timetuple()
     dmy = (tt[2], tt[1]-1, tt[0])
     return wx.DateTimeFromDMY(*dmy)
def wxdate2pydate(date):
     assert isinstance(date, wx.DateTime)
     if date.IsValid():
          ymd = map(int, date.FormatISODate().split('-'))
          return None

You can read more about this topic on this old wxPython mailing thread. Have fun and happy coding!


Nick Coghlan: The transition to multilingual programming

 ∗ Planet Python

A recent thread on python-dev prompted me to summarise the current state of the ongoing industry wide transition from bilingual to multilingual programming as it relates to Python's cross-platform support. It also relates to the reasons why Python 3 turned out to be more disruptive than the core development team initially expected.

A good starting point for anyone interested in exploring this topic further is the "Origin and development" section of the Wikipedia article on Unicode, but I'll hit the key points below.

Monolingual computing

At their core, computers only understand single bits. Everything above that is based on conventions that ascribe higher level meanings to particular sequences of bits. One particular important set of conventions for communicating between humans and computers are "text encodings": conventions that map particular sequences of bits to text in the actual languages humans read and write.

One of the oldest encodings still in common use is ASCII (which stands for "American Standard Code for Information Interchange"), developed during the 1960's (it just had its 50th birthday in 2013). This encoding maps the letters of the English alphabet (in both upper and lower case), the decimal digits, various punctuation characters and some additional "control codes" to the 128 numbers that can be encoded as a 7-bit sequence.

Many computer systems today still only work correctly with English - when you encounter such a system, it's a fairly good bet that either the system itself, or something it depends on, is limited to working with ASCII text. (If you're really unlucky, you might even get to work with modal 5-bit encodings like ITA-2, as I have. The legacy of the telegraph lives on!)

Working with local languages

The first attempts at dealing with this limitation of ASCII simply assigned meanings to the full range of 8-bit sequences. Known collectively as "Extended ASCII", each of these systems allowed for an additional 128 characters, which was enough to handle many European and Cyrillic scripts. Even 256 characters was nowhere near sufficient to deal with Indic or East Asian languages, however, so this time also saw a proliferation of ASCII incompatible encodings like ShiftJIS, ISO-2022 and Big5. This is why Python ships with support for dozens of codecs from around the world.

This proliferation of encodings required a way to tell software which encoding should be used to read the data. For protocols that were originally designed for communication between computers, agreeing on a common text encoding is usually handled as part of the protocol. In cases where no encoding information is supplied (or to handle cases where there is a mismatch between the claimed encoding and the actual encoding), then applications may make use of "encoding detection" algorithms, like those provided by the chardet package for Python. These algorithms aren't perfect, but can give good answers when given a sufficient amount of data to work with.

Local operating system interfaces, however, are a different story. Not only don't they inherently convey encoding information, but the nature of the problem is such that trying to use encoding detection isn't practical. Two key systems arose in an attempt to deal with this problem:

  • Windows code pages
  • POSIX locale encodings

With both of these systems, a program would pick a code page or locale, and use the corresponding text encoding to decide how to interpret text for display to the user or combination with other text. This may include deciding how to display information about the contents of the computer itself (like listing the files in a directory).

The fundamental premise of these two systems is that the computer only needs to speak the language of its immediate users. So, while the computer is theoretically capable of communicating in any language, it can effectively only communicate with humans in one language at a time. All of the data a given application was working with would need to be in a consistent encoding, or the result would be uninterpretable nonsense, something the Japanese (and eventually everyone else) came to call mojibake.

It isn't a coincidence that the name for this concept came from an Asian country: the encoding problems encountered there make the issues encountered with European and Cyrillic languages look trivial by comparison.

Unfortunately, this "bilingual computing" approach (so called because the computer could generally handle English in addition to the local language) causes some serious problems once you consider communicating between computers. While some of those problems were specific to network protocols, there are some more serious ones that arise when dealing with nominally "local" interfaces:

  • networked computing meant one username might be used across multiple systems, including different operating systems
  • network drives allow a single file server to be accessed from multiple clients, including different operating systems
  • portable media (like DVDs and USB keys) allow the same filesystem to be accessed from multiple devices at different points in time
  • data synchronisation services like Dropbox need to faithfully replicate a filesystem hierarchy not only across different desktop environments, but also to mobile devices

For these protocols that were originally designed only for local interoperability communicating encoding information is generally difficult, and it doesn't necessarily match the claimed encoding of the platform you're running on.

Unicode and the rise of multilingual computing

The path to addressing the fundamental limitations of bilingual computing actually started more than 25 years ago, back in the late 1980's. An initial draft proposal for a 16-bit "universal encoding" was released in 1988, the Unicode Consortium was formed in early 1991 and the first volume of the first version of Unicode was published later that same year.

Microsoft added new text handling and operating system APIs to Windows based on the 16-bit C level wchar_t type, and Sun also adopted Unicode as part of the core design of Java's approach to handling text.

However, there was a problem. The original Unicode design had decided that "16 bits ought to be enough for anybody" by restricting their target to only modern scripts, and only frequently used characters within those scripts. However, when you look at the "rarely used" Kanji and Han characters for Japanese and Chinese, you find that they include many characters that are regularly used for the names of people and places - they're just largely restricted to proper nouns, and so won't show up in a normal vocabulary search. So Unicode 2.0 was defined in 1996, expanding the system out to a maximum of 21 bits per code point (using up to 32 bits per code point for storage).

As a result, Windows (including the CLR) and Java now use the little-endian variant of UTF-16 to allow their text APIs to handle arbitrary Unicode code points. The original 16-bit code space is now referred to as the Basic Multilingual Plane.

While all that was going on, the POSIX world ended up adopting a different strategy for migrating to full Unicode support: attempting to standardise on the ASCII compatible UTF-8 text encoding.

The choice between using UTF-8 and UTF-16-LE as the preferred local text encoding involves some complicated trade-offs, and that's reflected in the fact that they have ended up being at the heart of two competing approaches to multilingual computing.

Choosing UTF-8 aims to treat formatting text for communication with the user as "just a display issue". It's a low impact design that will "just work" for a lot of software, but it comes at a price:

  • because encoding consistency checks are mostly avoided, data in different encodings may be freely concatenated and passed on to other applications. Such data is typically not usable by the receiving application.
  • for interfaces without encoding information available, it is often necessary to assume an appropriate encoding in order to display information to the user, or to transform it to a different encoding for communication with another system that may not share the local system's encoding assumptions. These assumptions may not be correct, but won't necessarily cause an error - the data may just be silently misinterpreted as something other than what was originally intended.
  • because data is generally decoded far from where it was introduced, it can be difficult to discover the origin of encoding errors.
  • as a variable width encoding, it is more difficult to develop efficient string manipulation algorithms for UTF-8. Algorithms originally designed for fixed width encodings will no longer work.
  • as a specific instance of the previous point, it isn't possible to split UTF-8 encoded text at arbitrary locations. Care needs to be taken to ensure splits only occur at code point boundaries.

UTF-16-LE shares the last two problem, but to a lesser degree (simply due to the fact most commonly used code points are in the 16-bit Basic Multilingual Plane). However, because it isn't generally suitable for use in network protocols and file formats (without significant additional encoding markers), the explicit decoding and encoding required encourages designs with a clear separation between binary data (including encoded text) and decoded text data.

Through the lens of Python

Python and Unicode were born on opposites side of the Atlantic ocean at roughly the same time (1991). The growing adoption of Unicode within the computing industry has had a profound impact on the evolution of the language.

Python 1.x was purely a product of the bilingual computing era - it had no support for Unicode based text handling at all, and was hence largely limited to 8-bit ASCII compatible encodings for text processing.

Python 2.x was still primarily a product of the bilingual era, but added multilingual support as an optional addon, in the form of the unicode type and support for a wide variety of text encodings. PEP 100 goes into the many technical details that needed to be covered in order to incorporate that feature. With Python 2, you can make multilingual programming work, but it requires an active decision on the part of the application developer, or at least that they follow the guidelines of a framework that handles the problem on their behalf.

By contrast, Python 3.x is designed to be a native denizen of the multilingual computing world. Support for multiple languages extends as far as the variable naming system, such that languages other than English become almost as well supported as English already was in Python 2. While the English inspired keywords and the English naming in the standard library and on the Python Package Index mean that Python's "native" language and the preferred language for global collaboration will always be English, the new design allows a lot more flexibility when working with data in other languages.

Consider processing a data table where the headings are names of Japanese individuals, and we'd like to use collections.namedtuple to process each row. Python 2 simply can't handle this task:

>>> from collections import namedtuple
>>> People = namedtuple("People", u"陽斗 慶子 七海")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.7/", line 310, in namedtuple
    field_names = map(str, field_names)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)

Users need to either restrict themselves to dictionary style lookups rather than attribute access, or else used romanised versions of their names (Haruto, Keiko, Nanami for the example). However, the case of "Haruto" is an interesting one, as there at least 3 different ways of writing that as Kanji (陽斗, 陽翔, 大翔), but they are all romanised as the same string (Haruto). If you try to use romaaji to handle a data set that contains more than one variant of that name, you're going to get spurious collisions.

Python 3 takes a very different perspective on this problem. It says it should just work, and it makes sure it does:

>>> from collections import namedtuple
>>> People = namedtuple("People", u"陽斗 慶子 七海")
>>> d = People(1, 2, 3)
>>> d.陽斗
>>> d.慶子
>>> d.七海

This change greatly expands the kinds of "data driven" use cases Python can support in areas where the ASCII based assumptions of Python 2 would cause serious problems.

Python 3 still needs to deal with improperly encoded data however, so it provides a mechanism for arbitrary binary data to be "smuggled" through text strings in the Unicode Private Use Area. This feature was added by PEP 383 and is managed through the surrogateescape error handler, which is used by default on most operating system interfaces. This recreates the old Python 2 behaviour of passing improperly encoded data through unchanged when dealing solely with local operating system interfaces, but complaining when such improperly encoded data is injected into another interface. The codec error handling system provides several tools to deal with these files, and we're looking at adding a few more relevant convenience functions for Python 3.5.

The underlying Unicode changes in Python 3 also made PEP 393 possible, which changed the way the CPython interpreter stores text internally. In Python 2, even pure ASCII strings would consume four bytes per code point on Linux systems. Using the "narrow build" option (as the Python 2 Windows builds from do) reduced that the only two bytes per code point when operating within the Basic Multilingual Plane, but at the cost of potentially producing wrong answers when asked to operate on code points outside the Basic Multilingual Plane. By contrast, starting with Python 3.3, CPython now stores text internally using the smallest fixed width data unit possible. That is, latin-1 text uses 8 bits per code point, UCS-2 (Basic Multilingual Plane) text uses 16-bits per code point, and only text containing code points outside the Basic Multilingual Plane will expand to needing the full 32 bits per code point. This can not only significantly reduce the amount of memory needed for multilingual applications, but may also increase their speed as well (as reducing memory usage also reduces the time spent copying data around).

Are we there yet?

In a word, no. Not for Python 3.4, and not for the computing industry at large. We're much closer than we ever have been before, though. Most POSIX systems now default to UTF-8 as their default encoding, and many systems offer a C.UTF-8 locale as an alternative to the traditional ASCII based C locale. When dealing solely with properly encoded data and metadata, and properly configured systems, Python 3 should "just work", even when exchanging data between different platforms.

For Python 3, the remaining challenges fall into a few areas:

  • helping existing Python 2 users adopt the optional multilingual features that will prepare them for eventual migration to Python 3 (as well as reassuring those users that don't wish to migrate that Python 2 is still fully supported, and will remain so for at least the next several years, and potentially longer for customers of commercial redistributors)
  • adding back some features for working entirely in the binary domain that were removed in the original Python 3 transition due to an initial assessment that they were operations that only made sense on text data (PEP 361 summary: bytes.__mod__ is coming back in Python 3.5 as a valid binary domain operation, bytes.format stays gone as an operation that only makes sense when working with actual text data)
  • better handling of improperly decoded data, including poor encoding recommendations from the operating system (for example, Python 3.5 will be more sceptical when the operating system tells it the preferred encoding is ASCII and will enable the surrogateescape error handler on sys.stdout when it occurs)
  • eliminating most remaining usage of the legacy code page and locale encoding systems in the CPython interpreter (this most notably affects the Windows console interface and argument decoding on POSIX. While these aren't easy problems to solve, it will still hopefully be possible to address them for Python 3.5)

More broadly, each major platform has its own significant challenges to address:

  • for POSIX systems, there are still a lot of systems that don't use UTF-8 as the preferred encoding and the assumption of ASCII as the preferred encoding in the default C locale is positively archaic. There is also still a lot of POSIX software that still believes in the "text is just encoded bytes" assumption, and will happily produce mojibake that makes no sense to other applications or systems.
  • for Windows, keeping the old 8-bit APIs around was deemed necessary for backwards compatibility, but this also means that there is still a lot of Windows software that simply doesn't handle multilingual computing correctly.
  • for both Windows and the JVM, a fair amount of nominally multilingual software actually only works correctly with data in the basic multilingual plane. This is a smaller problem than not supporting multilingual computing at all, but was quite a noticeable problem in Python 2's own Windows support.

Mac OS X is the platform most tightly controlled by any one entity (Apple), and they're actually in the best position out of all of the current major platforms when it comes to handling multilingual computing correctly. They've been one of the major drivers of Unicode since the beginning (two of the authors of the initial Unicode proposal were Apple engineers), and were able to force the necessary configuration changes on all their systems, rather than having to work with an extensive network of OEM partners (Windows, commercial Linux vendors) or relatively loose collaborations of individuals and organisations (community Linux distributions).

Modern mobile platforms are generally in a better position than desktop operating systems, mostly by virtue of being newer, and hence defined after Unicode was better understood. However, the UTF-8 vs UTF-16-LE distinction for text handling exists even there, thanks to the Java inspired Dalvik VM in Android (plus the cloud-backed nature of modern smartphones means you're even more likely to be encounter files from multiple machines when working on a mobile device).

09:10 eGenix PyRun - One file Python Runtime 2.0.1 GA

 ∗ Planet Python


eGenix PyRun is our open source, one file, no installation version of Python, making the distribution of a Python interpreter to run based scripts and applications to Unix based systems as simple as copying a single file.

eGenix PyRun's executable only needs 11MB for Python 2 and 13MB for Python 3, but still supports most Python application and scripts - and it can be compressed to just 3-4MB using upx, if needed.

Compared to a regular Python installation of typically 100MB on disk, eGenix PyRun is ideal for applications and scripts that need to be distributed to several target machines, client installations or customers.

It makes "installing" Python on a Unix based system as simple as copying a single file.

eGenix has been using the product internally in the mxODBC Connect Server since 2008 with great success and decided to make it available as a stand-alone open-source product.

We provide both the source archive to build your own eGenix PyRun, as well as pre-compiled binaries for Linux, FreeBSD and Mac OS X, as 32- and 64-bit versions. The binaries can be downloaded manually, or you can let our automatic install script install-pyrun take care of the installation: ./install-pyrun dir and you're done.

Please see the product page for more details:

    >>> eGenix PyRun - One file Python Runtime


This is a patch level release of eGenix PyRun 2.0. The major new feature in 2.0 is the added Python 3.4 support.

New Features

  • Upgraded eGenix PyRun to work with and use Python 2.7.8 per default.

Enhancements / Changes

  • Fixed a bug in the license printer to show the correct license URL.

install-pyrun Quick Install Enhancements

eGenix PyRun includes a shell script called install-pyrun, which greatly simplifies installation of PyRun. It works much like the virtualenv shell script used for creating new virtual environments (except that there's nothing virtual about PyRun environments).

With the script, an eGenix PyRun installation is as simple as running:

./install-pyrun targetdir

This will automatically detect the platform, download and install the right pyrun version into targetdir.

We have updated this script since the last release:

  • Updated install-pyrun to default to eGenix PyRun 2.0.1 and its feature set.

For a complete list of changes, please see the eGenix PyRun Changelog.

Please see the eGenix PyRun 2.0.0 announcement for more details about eGenix PyRun 2.0.


Please visit the eGenix PyRun product page for downloads, instructions on installation and documentation of the product.

More Information

For more information on eGenix PyRun, licensing and download instructions, please write to

Enjoy !

Marc-Andre Lemburg,

Anatoly Techtonik: How to make RAM disk in Linux

 ∗ Planet Python

UPDATE (2014-08-27): Exactly three years later I discovered that Linux already comes with RAM disk enabled by default, mounted as `/dev/shm` (which points to `/run/shm` on Debian/Ubuntu):
$ df -h /dev/shm
Filesystem Size Used Avail Use% Mounted on
tmpfs 75M 4.0K 75M 1% /run/shm
See detailed info here.

*RAM disk* is a term from the past when DOS was alive and information was stored on disks instead of internet. If you created image of some disk, it was possible to load it into memory. Memory disks were useful to load software from Live CDs. Usually software needs some space to write data during boot sequence, and RAM is the fastest way to setup one.

Filesystem space in memory can be extremely useful today too. For example, to run tests without reducing resource of SSD. While the idea is not new, there was no incentive to explore it until I've run upon tmpfs reference in Ubuntu Wiki.

For example, to get 2Gb of space for files in RAM, edit /etc/fstab to add the following line:
tmpfs     /var/ramspace       tmpfs     defaults,size=2048M     0     0
/var/ramspace is now the place to store your files in memory.


Fabio Zadrozny: PyDev 3.7.0, PyDev/PyCharm Debugger merge, Crowdfunding

 ∗ Planet Python

PyDev 3.7.0 was just released.

There are some interesting things to talk about in this release...

The first is that the PyDev debugger was merged with the fork which was used in PyCharm. The final code for the debugger (and the interactive console) now lives at: This effort was backed-up by Intellij, and from now on, work on the debugger from either front (PyDev or PyCharm) should benefit both -- pull requests are also very welcome :)

With this merge, PyDev users will gain GEvent debugging and breakpoints at Django templates (but note that the breakpoints can only be added through the LiClipse HTML/Django Templates editor), and in the interactive console front (which was also part of this merge), the asynchronous output and console interrupt are new.

This release also changed the default UI for the PyDev editor (and for LiClipse editors too), so, the minimap (which had a bunch of enhancements) is now turned on by default and the scrollbars are hidden by default -- those that prefer the old behavior must change the settings on the minimap preferences to match the old style.

Also noteworthy is that the code-completion for all letter chars is turned on by default (again, users that want the old behavior have to uncheck that setting from the code completion preferences page), and this release also has a bunch of bugfixes.

Now, I haven't talked about the crowdfunding for keeping the support on PyDev and a new profiler UI ( after it finished... well, it didn't reach its full goal -- in practice that means the profiler UI will still be done and users which supported it will receive a license to use it, but it won't be open source... all in all, it wasn't that bad either, it got halfway through its target and many people seemed to like the idea -- in the end I'll know if keeping it on even if not having the full target reached was a good idea or not only after it's commercially available (as the idea is that new licenses will be what will cover for its development expenses and will keep it going afterwards).

A note for profiler contributors is that I still haven't released an early-release version, but I'm working on it :)

As for PyDev, the outcome of the funding also means I have fewer resources to support it than I'd like, but given that LiClipse ( provides a share of its earnings to support PyDev and latecomers can still contribute through, I still hope that I won't need to lower its support (which'd mean taking on other projects in the time I currently have for PyDev), and I think it'll still be possible to do the things outlined in the crowdfunding regarding it.


An Event Apart: The Role of Visual Design

 ∗ LukeW | Digital Product Design + Strategy

At An Event Apart in Chicago IL 2014, Jenny Lam talked about the value of visual design in digital products and shared some tips for evaluating aesthetics. Here's my notes from her talk Hit it With a Pretty Stick:

  • User experience designers need to understand a lot of things in addition to visual design. But as the discipline has matured, our understanding and evaluation of aesthetics has not. How do we champion aesthetics in our work and in our organizations?
  • Most of us believe in the value of visual design but in the real world we often have to convince others as well.
  • Visual design's impact on the bottom line is real. For example, Mint licensed a technology from someone else and added a user experience on top. That created $170 million dollars in value. Every dollar spent on aesthetics yielded Gillete $415+ dollars vs. only a $7 return from advertising. Design-driven companies outperform the S&P by 228%.
  • Companies that invest in design have better customer satisfaction, increased loyalty, employee retention, and more.
  • Aesthetics also communicate credibility and trust. In Stanford research, look & feel is primary driver of credibility.

Visual Design & Teams

  • For creative projects, we need creative leaders. If there's a leader at the top with a creative vision, great. If not, the creative leadership can come from the hands-on design team.
  • Interaction designers and visual designers have different skills. Interaction: HCI trained, Product Definition, User Flows. Visual: Graphic Design, Sensory-minded, Brand-centric. Together they're a powerful combination.
  • Give your visual designers accountability. Empower them. Carry through on aesthetics internally to create a design culture.
  • Dotted line relationships to the marketing team can help design teams get the resources they need to create great experiences. Marketing tends to have big budgets and cares about the visual aspect of products.
  • Visual designers can take ownership of in-house creative. Shirts, posters, etc are very visible and can show off visual design quality.

Aesthetic Principles

  • Aesthetics are about three components: integrity (how true & cohesive is the design), harmony (how the parts relate to the whole), and radiance (how we feel when we experience a product).
  • Integrity puts us out there, allows our brand to be memorable. The visual interface has become as important as a brand logo.
  • We remember only really good experiences and really bad ones. Not average experiences.
  • Harmony: all our elements need to support the central story. Use patterns, textures, and color sets to unify designs into a cohesive whole.
  • Look to nature for ideas of color harmony.
  • Radiance: light, shadow, and material allows you to create a sense of environment.
  • Make sure you tweak/edit default settings in your drawing apps. Don't use standard drop shadows, design them. Keep dimensions "human relatable": how would things look in real life?
  • Details matter. When we're delighted, the interface feels like fun and easier to use. Look for opportunities to delight.

Tools & Techniques

  • Methods to create a design language: futurecasting, moodboards, positioning matrices.
  • Futurecasting: imagine the end state & how people will feel. What will the press release be, how can the visual design support that?
  • Start with words: talk to stakeholders to figure out the visual direction that's right for a project. Ask people why they chose specific adjectives.
  • Positioning matrices (where a brand fits on a spectrum), moodboards, and more can help set the right visual direction.
  • Everyone can have an opinion but critique is not art direction.
  • Rules of critique: visual designer is the owner & gets a veto, write down agreed upon goals, focus on feedback not on solutions, don't come up with solutions as a group.
  • Say: “I don’t know what to focus on first.” Not: “It’s too cluttered.”
  • Say: “I’m having a hard time reading the text." Not: “Make the font bigger!”
  • Remind people you are a professional.

An Event Apart: Icon Design Process

 ∗ LukeW | Digital Product Design + Strategy

At An Event Apart in Chicago IL 2014, Jon Hicks discussed the modern icon design process and shared useful design and development tips for icons. Here's my notes from his talk Icon Design Process:

  • Icons can be used to support navigation, action, and messaging in Web sites and applications. They can also reinforce status by providing more information than just color.
  • We had a visual language before we had written language: symbols, hieroglyphics, etc. So symbols were around for a long time before they made their way to computers in 1974.
  • Today there's lots of royalty free icon sets available for use in sites and apps -so why make your own? Icons sets might not be the right size or style for your usage. You may need more or less icons than exist in a ready-made set. In these cases and more, you may need custom icons.
  • The icon design process: research, drawing, and deployment (which changes frequently).
  • Research: a client brief and icon audit can reveal areas of inconsistency, gaps, or duplicates in the icon design of a site. Compile a list of the icons you'll need and what they represent.
  • How do you go from a word to a finished icon? You have two options: iconic (literal) or symbolic (needs to be learned).
  • When possible, follow conventions for your icon designs. The Noun Project is a great resource for common visual symbols. But be aware of local considerations. Symbols like piggy banks, owls, and thumbs up may have inappropriate meanings in other cultures.
  • Truly symbolic icons are more easily understood. The difference between outlined and solid icons is not the determining factor for comprehension.
  • Don't get too fancy with your icons just to make them different. Make your icon as simple as possible but no simpler.
  • Drawing: use whatever tools you are comfortable with. Start with a pixel grid to align gaps and weights within an icon. Your gird does not need to be even.
  • Automatic resizing of icons to create larger images might not provide the right ratio/balance between elements. You may need to tweak line weights or sizes to make things look right at different sizes.
  • With an icon set, you may need to adjust sizing and alignment to make things appear optically the same. Different shapes will appear bigger/smaller when displayed as part of a set.
  • Think about where shadows will fall within an icon to create the right balance of space.
  • Just when people started embracing SVG, Adobe started to remove SVG features from their apps since no one was using them before.
  • Sketch is starting to mature enough to be a viable alternative to Illustrator for drawing icons.
  • Svgo-gui is a simple drag and drop tool for optimizing SVG images. You can further compress SVG by GZIP-ing them on your server.
  • Deployment: Icon fonts or SVG -which one to use? Both can be right for your project.
  • Why use Icon fonts? One small file, accessible & scalable, easily styled with CSS, no sprites needed, supported in IE4+
  • Why use SVG? less hassle, support (3 versions back), avoids sprites, can use multiple colours, are still style-able with CSS animations
  • Grumpicon is a tool that can help you create SVG art for your sites.

An Event Apart: How To Champion Ideas Back At Work

 ∗ LukeW | Digital Product Design + Strategy

At An Event Apart in Chicago IL 2014, Scott Berkun discussed the modern icon design process and shared useful design and development tips for icons. Here's my notes from his talk How To Champion Ideas Back At Work:

  • The real designer is the person with the power to make decisions. It doesn't matter what title they have or their background.
  • In most situations, the final decision maker is not trained in creative disciplines.
  • If you really want to make an impact, you may need to remove the word design or engineering from your title.
  • Today designers can be founders and bring their ideas to life.

Meeting Others

  • You take in information at an event, internalize it, then make use of that information afterward.
  • This requires you to pay attention to the information you're hearing. At first you can take in lots of info but over time, you can retain less information.
  • Staying connected helps you champion ideas. Design requires working with other people.
  • Networking: ask everyone for a business card, saying thank you starts a conversation, post your notes during an event people will find you, if you use LinkedIn, write a personal message.
  • Start introductions with a simple authentic point: I met you at ___."
  • Casual professional events allow you to re-connect. Find your local UX happy hour and invite others.

What to Champion

  • Events are abstractions -they need to apply to a variety of people and their needs.
  • Our lives are specific -we need to deal with specific contexts on a regular basis.
  • To remember what you've learned, try min/max note taking. Take 5 bullets per talk, note some links & reflections, post a summary on your blog and tweet it out, post it at work, share it with your boss.
  • Make a chart of lessons learned and map them to a specific problem at work where you'd like to apply the ideas you heard. Include the people that need to be involved.
  • We like to imagine successes were perfect. We romanticize the role of the creator. But in reality there's always lots of frustration, dead ends, and adjustments.
  • When you start working on a project, you don't know what the outcome will be. That's the role of the creator.

How to Champion

  • The real process is not get idea, build, and ship. Instead there's a lot of convincing in between.
  • Being outspoken makes you a target.
  • Language is manipulation. Every bit of writing, design, or code has intent.
  • Charm and persuasion is emotional. It's not logical, it's designed. There is no abstraction, being charming depends on who you are trying to charm.
  • Instead of "here's what you should be doing", focus on "here's what will solve your problem".
  • The people with power are often the ones most resistant to change. They benefit from the status quo.
  • How to convince your boss: be awesome at your job. The best people on the team are more likely to get heard.
  • Get support from an influential coworker. Plan a trial, including how to evaluate it.
  • Pitch, repeat, your reputation will grow over time.

An Event Apart: Content for Sensitive Situations

 ∗ LukeW | Digital Product Design + Strategy

At An Event Apart in Chicago IL 2014 Kate Kiefer Lee talked over writing content for legal, help, error, and other forms of sensitive content. Here's my notes from her talk on Touchy Subjects: Creating Content for Sensitive Situations:

  • When face to face, we get immediate feedback from people because we can see them and understand their feelings. That's empathy and its often missing in our online content.
  • We need to take the empathy we already use everyday and translate that to our online software.
  • There are a lot of topics that are sensitive by nature: health, medicine, money, religion, politic, fundraising, private information.
  • Urgent messages: we need to tell people bad news quickly. Errors, downtime, warnings, rejections, and apologies are all examples of urgent messages that are time-sensitive.
  • Not all touchy subjects are urgent. Think 311 vs. 911: help documents, customer service emails, forms, contact pages, legal policies, etc.
  • Make a list of all your content types. Pull out any you think are urgent, bad news, or touchy subjects. Then map them to people's emotions.
  • People have all kinds of feelings when interacting with your content. When someone's needs are being met they may feel very different then when their needs are not being met. How can you meet people's needs?
  • Match your reader's feelings to the tone you use in your content. Examples: error messages map to frustration and need gentle, calm, and serious messages. In help documents, we want to be helpful and friendly.
  • Put yourself in people's shoes to decide how to write your content for them. We're not writing for writings sake, we're communicators trying to help people to do certain things.


  • Be clear: all content needs to be concise and focused.
  • Get to the point: don't try to soften bad news, just get it out.
  • Stay calm: don't use exclamation points or all caps.
  • Be serious: you don't need to funny all the time.
  • Accept responsibility.
  • Be nice: you don't always have to be interesting or clever, but you can always be nice.
  • When you adopt these principles, you help people become more effective, reduce customer service issues, and improve word-of-mouth marketing.
  • Read all your messages out loud. This helps you catch errors and typos, improves flow and makes you sound human. It also makes you more empathetic and naturally puts you in a conversational frame of mind.

Content Types

  • Errors: we want to be gentle, calm, direct, and serious. Example: "We regret to inform you that we are unable to process your request as your credit card has expired." Instead try: "Your credit card has expired. Please try another card."
  • Say exactly what you mean and say it nicely.
  • Customer Service: make sure you are not being repetitive. Treat people like people. They are stressed out, frustrated, or confused. Don't repeat canned messages.
  • Help documents: people may be there trouble-shooting. Don't let your personality get in the way. Use extremely specific titles. Titles are very important in help documents -they help guide people to what they need.
  • After clarity, consistency is the most important thing in help documents. Keep your interface terms consistent in your service and your help.
  • Feedback/Contact content: there's not a lot of room for personality here. What works on a product page is less likely to work on a contact page. Reduce the amount of information you collect up front. You can always collect more later with follow-on questions.
  • Don't keep your voice and tone the same across different content types. Adapt to different situations appropriately.
  • Unsubscribe pages: people may be annoyed or frustrated. Validate their feelings and offer a solution (less email).
  • Social media: people are interested and curious but still need to be courteous, sensitive, and direct. Often times, you should listen more than you talk on social media.
  • Don't become delusional about your importance online. There's many times you're better off not saying anything.
  • Legal policies: people may be confused, and apprehensive. Be calm, through, and clear. You don't want to look like you are hiding something.
  • Terms of service can include summaries to help people understand the big picture. But people are agreeing to the full terms. Work on making all your text working.
  • Editorially and Automatic make their legal and terms of service freely available to others to reuse and update.
  • Apologies: when you apologize you need to own it. Show you understand the seriousness of your issue. Don't say "our apologies for any inconvenience this may have caused." Take responsibility. Be specific about what you did wrong and say what you'll do to prevent it in the future.
  • When we are in a hurry, sometimes we forget to be nice. Create some templates of possible content types before you need them. Know who needs to sign-off so you can apologize quickly. Create an emergency contact list of who needs to be involved when sensitive situations arise.

Teach These Concepts

  • We're all content people. The more people on your team know about how to manage sensitive content, the more cohesive your messaging will be.
  • A voice and tone guide can be a resource for your team members. Give them the tools they need. Example from Mailchimp: Voice & Tone Guide.
  • Focus on making a communication guide not a style & grammar guide.
  • When we're talking about content we often focus too much on us and what we want to say. Our goal is not for people to compliment our content, its for people to get things done quickly and easily.


Mike C. Fletcher: Python-dbus needs some non-trivial examples

 ∗ Planet Python

So I got tired of paying work this afternoon and decided I would work on getting a dbus service started for Listener. The idea here is that there will be a DBus service which does all the context management, microphone setup, playback, etc and client software (such as the main GUI and apps that want to allow voice coding without going through low-level-grotty simulated typing) can use to interact with it.

But how does one go about exposing objects on DBus in the DBus-ian way? It *seems* that object-paths should produce a REST-like hierarchy where each object I want to expose is presented at /com/vrplumber/listener/context/... but should that be done on-demand? If I have 20 contexts, should I expose them all at start-up, or should the user "request" them one at a time (get_context( key ) -> path?). Should I use a  ObjectTree? How do I handle deletion/de-registration in such a way that clients are notified of the removed objects? I can hack these things in, but it would be nice to know the *right* way to do this kind of work. Should I expose functions that process directories (import this directory), or only those which process in-memory data-sets (add these words to the dictionary), can (python) DBus handle many MBs of data? What does a proper "real" DBus service look like?

So, anyone know of some good examples of python-dbus services exposing non-trivial services? Many objects, many methods, object life-cycle operations, many signals, yada, yada?

(BTW, my hacks are up on github if anyone cares to hit me with a clue-stick).

Ian Ozsvald: Why are technical companies not using data science?

 ∗ Planet Python

Here’s a quick question. How come more technical companies aren’t making use of data science? By “technical” I mean any company with data and the smarts to spot that it has value, by “data science” I mean any technical means to exploit this data for financial gain (e.g. visualisation to guide decisions, machine learning, prediction).

I’m guessing that it comes down to an economic question – either it isn’t as valuable as some other activity (making mobile apps? improving UX on the website? paid marketing? expanding sales to new territories?) or it is perceived as being valuable but cannot be exploited (maybe due to lack of skills and training or data problems).

I’m thinking about this for my upcoming keynote at PyConIreland, would you please give me some feedback in the survey below (no sign-up required)?

To be clear – this is an anonymous survey, I’ll have no idea who gives the answers.

Create your free online surveys with SurveyMonkey , the world’s leading questionnaire tool.


If the above is interesting then note that we’ve got a data science training list where we make occasional announcements about our upcoming training and we have two upcoming training courses. We also discuss these topics at our PyDataLondon meetups. I also have a slightly longer survey (it’ll take you 2 minutes, no sign-up required), I’ll be discussing these results at the next PyDataLondon so please share your thoughts.

Ian applies Data Science as an AI/Data Scientist for companies in ModelInsight, sign-up for Data Science tutorials in London. Historically Ian ran Mor Consulting. He also founded the image and text annotation API, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

Tuesday, 26 August



 ∗ The Go Programming Language Blog


Go is a statically typed language that does not permit operations that mix numeric types. You can't add a float64 to an int, or even an int32 to an int. Yet it is legal to write 1e6*time.Second or math.Exp(1) or even 1<<('\t'+2.0). In Go, constants, unlike variables, behave pretty much like regular numbers. This post explains why that is and what it means.

Background: C

In the early days of thinking about Go, we talked about a number of problems caused by the way C and its descendants let you mix and match numeric types. Many mysterious bugs, crashes, and portability problems are caused by expressions that combine integers of different sizes and "signedness". Although to a seasoned C programmer the result of a calculation like

unsigned int u = 1e9;
long signed int i = -1;
... i + u ...

may be familiar, it isn't a priori obvious. How big is the result? What is its value? Is it signed or unsigned?

Nasty bugs lurk here.

C has a set of rules called "the usual arithmetic conversions" and it is an indicator of their subtlety that they have changed over the years (introducing yet more bugs, retroactively).

When designing Go, we decided to avoid this minefield by mandating that there is no mixing of numeric types. If you want to add i and u, you must be explicit about what you want the result to be. Given

var u uint
var i int

you can write either uint(i)+u or i+int(u), with both the meaning and type of the addition clearly expressed, but unlike in C you cannot write i+u. You can't even mix int and int32, even when int is a 32-bit type.

This strictness eliminates a common cause of bugs and other failures. It is a vital property of Go. But it has a cost: it sometimes requires programmers to decorate their code with clumsy numeric conversions to express their meaning clearly.

And what about constants? Given the declarations above, what would make it legal to write i = 0 or u = 0? What is the type of 0? It would be unreasonable to require constants to have type conversions in simple contexts such as i = int(0).

We soon realized the answer lay in making numeric constants work differently from how they behave in other C-like languages. After much thinking and experimentation, we came up with a design that we believe feels right almost always, freeing the programmer from converting constants all the time yet being able to write things like math.Sqrt(2) without being chided by the compiler.

In short, constants in Go just work, most of the time anyway. Let's see how that happens.


First, a quick definition. In Go, const is a keyword introducing a name for a scalar value such as 2 or 3.14159 or "scrumptious". Such values, named or otherwise, are called constants in Go. Constants can also be created by expressions built from constants, such as 2+3 or 2+3i or math.Pi/2 or ("go"+"pher").

Some languages don't have constants, and others have a more general definition of constant or application of the word const. In C and C++, for instance, const is a type qualifier that can codify more intricate properties of more intricate values.

But in Go, a constant is just a simple, unchanging value, and from here on we're talking only about Go.

String constants

There are many kinds of numeric constants—integers, floats, runes, signed, unsigned, imaginary, complex—so let's start with a simpler form of constant: strings. String constants are easy to understand and provide a smaller space in which to explore the type issues of constants in Go.

A string constant encloses some text between double quotes. (Go has also has raw string literals, enclosed by backquotes ``, but for the purpose of this discussion they have all the same properties.) Here is a string constant:

"Hello, 世界"

(For much more detail about the representation and interpretation of strings, see this blog post.)

What type does this string constant have? The obvious answer is string, but that is wrong.

This is an untyped string constant, which is to say it is a constant textual value that does not yet have a fixed type. Yes, it's a string, but it's not a Go value of type string. It remains an untyped string constant even when given a name:

const hello = "Hello, 世界"

After this declaration, hello is also an untyped string constant. An untyped constant is just a value, one not yet given a defined type that would force it to obey the strict rules that prevent combining differently typed values.

It is this notion of an untyped constant that makes it possible for us to use constants in Go with great freedom.

So what, then, is a typed string constant? It's one that's been given a type, like this:

const typedHello string = "Hello, 世界"

Notice that the declaration of typedHello has an explicit string type before the equals sign. This means that typedHello has Go type string, and cannot be assigned to a Go variable of a different type. That is to say, this code works:

// +build OMIT

package main

import "fmt"

const typedHello string = "Hello, 世界"

func main() {
    var s string
    s = typedHello

but this does not:

// +build OMIT

package main

import "fmt"

const typedHello string = "Hello, 世界"

func main() {
    type MyString string
    var m MyString
    m = typedHello // Type error

The variable m has type MyString and cannot be assigned a value of a different type. It can only be assigned values of type MyString, like this:

// +build OMIT

package main

import "fmt"

const typedHello string = "Hello, 世界"

func main() {
	type MyString string
	var m MyString
    const myStringHello MyString = "Hello, 世界"
    m = myStringHello // OK

or by forcing the issue with a conversion, like this:

// +build OMIT

package main

import "fmt"

const typedHello string = "Hello, 世界"

func main() {
	type MyString string
	var m MyString
    m = MyString(typedHello)

Returning to our untyped string constant, it has the helpful property that, since it has no type, assigning it to a typed variable does not cause a type error. That is, we can write

m = "Hello, 世界"


m = hello

because, unlike the typed constants typedHello and myStringHello, the untyped constants "Hello, 世界" and hello have no type. Assigning them to a variable of any type compatible with strings works without error.

These untyped string constants are strings, of course, so they can only be used where a string is allowed, but they do not have type string.

Default type

As a Go programmer, you have certainly seen many declarations like

str := "Hello, 世界"

and by now you might be asking, "if the constant is untyped, how does str get a type in this variable declaration?" The answer is that an untyped constant has a default type, an implicit type that it transfers to a value if a type is needed where none is provided. For untyped string constants, that default type is obviously string, so

str := "Hello, 世界"


var str = "Hello, 世界"

means exactly the same as

var str string = "Hello, 世界"

One way to think about untyped constants is that they live in a kind of ideal space of values, a space less restrictive than Go's full type system. But to do anything with them, we need to assign them to variables, and when that happens the variable (not the constant itself) needs a type, and the constant can tell the variable what type it should have. In this example, str becomes a value of type string because the untyped string constant gives the declaration its default type, string.

In such a declaration, a variable is declared with a type and initial value. Sometimes when we use a constant, however, the destination of the value is not so clear. For instance consider this statement:

// +build OMIT

package main

import "fmt"

func main() {
    fmt.Printf("%s", "Hello, 世界")

The signature of fmt.Printf is

func Printf(format string, a ...interface{}) (n int, err error)

which is to say its arguments (after the format string) are interface values. What happens when fmt.Printf is called with an untyped constant is that an interface value is created to pass as an argument, and the concrete type stored for that argument is the default type of the constant. This process is analogous to what we saw earlier when declaring an initialized value using an untyped string constant.

You can see the result in this example, which uses the format %v to print the value and %T to print the type of the value being passed to fmt.Printf:

// +build OMIT

package main

import "fmt"

const hello = "Hello, 世界"

func main() {
    fmt.Printf("%T: %v\n", "Hello, 世界", "Hello, 世界")
    fmt.Printf("%T: %v\n", hello, hello)

If the constant has a type, that goes into the interface, as this example shows:

// +build OMIT

package main

import "fmt"

type MyString string

const myStringHello MyString = "Hello, 世界"

func main() {
    fmt.Printf("%T: %v\n", myStringHello, myStringHello)

(For more information about how interface values work, see the first sections of this blog post.)

In summary, a typed constant obeys all the rules of typed values in Go. On the other hand, an untyped constant does not carry a Go type in the same way and can be mixed and matched more freely. It does, however, have a default type that is exposed when, and only when, no other type information is available.

Default type determined by syntax

The default type of an untyped constant is determined by its syntax. For string constants, the only possible implicit type is string. For numeric constants, the implicit type has more variety. Integer constants default to int, floating-point constants float64, rune constants to rune (an alias for int32), and imaginary constants to complex128. Here's our canonical print statement used repeatedly to show the default types in action:

// +build OMIT

package main

import "fmt"

func main() {
    fmt.Printf("%T %v\n", 0, 0)
    fmt.Printf("%T %v\n", 0.0, 0.0)
    fmt.Printf("%T %v\n", 'x', 'x')
    fmt.Printf("%T %v\n", 0i, 0i)

(Exercise: Explain the result for 'x'.)


Everything we said about untyped string constants can be said for untyped boolean constants. The values true and false are untyped boolean constants that can be assigned to any boolean variable, but once given a type, boolean variables cannot be mixed:

// +build OMIT

package main

import "fmt"

func main() {
    type MyBool bool
    const True = true
    const TypedTrue bool = true
    var mb MyBool
    mb = true      // OK
    mb = True      // OK
    mb = TypedTrue // Bad

Run the example and see what happens, then comment out the "Bad" line and run it again. The pattern here follows exactly that of string constants.


Floating-point constants are just like boolean constants in most respects. Our standard example works as expected in translation:

// +build OMIT

package main

import "fmt"

func main() {
    type MyFloat64 float64
    const Zero = 0.0
    const TypedZero float64 = 0.0
    var mf MyFloat64
    mf = 0.0       // OK
    mf = Zero      // OK
    mf = TypedZero // Bad

One wrinkle is that there are two floating-point types in Go: float32 and float64. The default type for a floating-point constant is float64, although an untyped floating-point constant can be assigned to a float32 value just fine:

// +build OMIT

package main

import "fmt"

func main() {
	const Zero = 0.0
	const TypedZero float64 = 0.0
    var f32 float32
    f32 = 0.0
    f32 = Zero      // OK: Zero is untyped
    f32 = TypedZero // Bad: TypedZero is float64 not float32.

Floating-point values are a good place to introduce the concept of overflow, or the range of values.

Numeric constants live in an arbitrary-precision numeric space; they are just regular numbers. But when they are assigned to a variable the value must be able to fit in the destination. We can declare a constant with a very large value:

    const Huge = 1e1000

—that's just a number, after all—but we can't assign it or even print it. This statement won't even compile:

// +build OMIT

package main

import "fmt"

func main() {
	const Huge = 1e1000

The error is, "constant 1.00000e+1000 overflows float64", which is true. But Huge might be useful: we can use it in expressions with other constants and use the value of those expressions if the result can be represented in the range of a float64. The statement,

// +build OMIT

package main

import "fmt"

func main() {
	const Huge = 1e1000
    fmt.Println(Huge / 1e999)

prints 10, as one would expect.

In a related way, floating-point constants may have very high precision, so that arithmetic involving them is more accurate. The constants defined in the math package are given with many more digits than are available in a float64. Here is the definition of math.Pi:

Pi    = 3.14159265358979323846264338327950288419716939937510582097494459

When that value is assigned to a variable, some of the precision will be lost; the assignment will create the float64 (or float32) value closest to the high-precision value. This snippet

// +build OMIT

package main

import (

func main() {
    pi := math.Pi

prints 3.141592653589793.

Having so many digits available means that calculations like Pi/2 or other more intricate evaluations can carry more precision until the result is assigned, making calculations involving constants easier to write without losing precision. It also means that there is no occasion in which the floating-point corner cases like infinities, soft underflows, and NaNs arise in constant expressions. (Division by a constant zero is a compile-time error, and when everything is a number there's no such thing as "not a number".)

Complex numbers

Complex constants behave a lot like floating-point constants. Here's a version of our now-familiar litany translated into complex numbers:

// +build OMIT

package main

import "fmt"

func main() {
    type MyComplex128 complex128
    const I = (0.0 + 1.0i)
    const TypedI complex128 = (0.0 + 1.0i)
    var mc MyComplex128
    mc = (0.0 + 1.0i) // OK
    mc = I            // OK
    mc = TypedI       // Bad

The default type of a complex number is complex128, the larger-precision version composed of two float64 values.

For clarity in our example, we wrote out the full expression (0.0+1.0i), but this value can be shortened to 0.0+1.0i, 1.0i or even 1i.

Let's play a trick. We know that in Go, a numeric constant is just a number. What if that number is a complex number with no imaginary part, that is, a real? Here's one:

    const Two = 2.0 + 0i

That's an untyped complex constant. Even though it has no imaginary part, the syntax of the expression defines it to have default type complex128. Therefore, if we use it to declare a variable, the default type will be complex128. The snippet

// +build OMIT

package main

import "fmt"

func main() {
	const Two = 2.0 + 0i
    s := Two
    fmt.Printf("%T: %v\n", s, s)

prints complex128: (2+0i). But numerically, Two can be stored in a scalar floating-point number, a float64 or float32, with no loss of information. Thus we can assign Two to a float64, either in an initialization or an assignment, without problems:

// +build OMIT

package main

import "fmt"

func main() {
	const Two = 2.0 + 0i
    var f float64
    var g float64 = Two
    f = Two
    fmt.Println(f, "and", g)

The output is 2 and 2. Even though Two is a complex constant, it can be assigned to scalar floating-point variables. This ability for a constant to "cross" types like this will prove useful.


At last we come to integers. They have more moving parts—many sizes, signed or unsigned, and more—but they play by the same rules. For the last time, here is our familiar example, using just int this time:

// +build OMIT

package main

import "fmt"

func main() {
    type MyInt int
    const Three = 3
    const TypedThree int = 3
    var mi MyInt
    mi = 3          // OK
    mi = Three      // OK
    mi = TypedThree // Bad

The same example could be built for any of the integer types, which are:

int int8 int16 int32 int64
uint uint8 uint16 uint32 uint64

(plus the aliases byte for uint8 and rune for int32). That's a lot, but the pattern in the way constants work should be familiar enough by now that you can see how things will play out.

As mentioned above, integers come in a couple of forms and each form has its own default type: int for simple constants like 123 or 0xFF or -14 and rune for quoted characters like 'a', '世' or '\r'.

No constant form has as its default type an unsigned integer type. However, the flexibility of untyped constants means we can initialize unsigned integer variables using simple constants as long as we are clear about the type. It's analogous to how we can initialize a float64 using a complex number with zero imaginary part. Here are several different ways to initialize a uint; all are equivalent, but all must mention the type explicitly for the result to be unsigned.

var u uint = 17
var u = uint(17)
u := uint(17)

Similarly to the range issue mentioned in the section on floating-point values, not all integer values can fit in all integer types. There are two problems that might arise: the value might be too large, or it might be a negative value being assigned to an unsigned integer type. For instance, int8 has range -128 through 127, so constants outside of that range can never be assigned to a variable of type int8:

// +build OMIT

package main

func main() {
    var i8 int8 = 128 // Error: too large.
	_ = i8

Similarly, uint8, also known as byte, has range 0 through 255, so a large or negative constant cannot be assigned to a uint8:

// +build OMIT

package main

func main() {
    var u8 uint8 = -1 // Error: negative value.
	_ = u8

This type-checking can catch mistakes like this one:

// +build OMIT

package main

func main() {
    type Char byte
    var c Char = '世' // Error: '世' has value 0x4e16, too large.
	_ = c

If the compiler complains about your use of a constant, it's likely a real bug like this.

An exercise: The largest unsigned int

Here is an informative little exercise. How do we express a constant representing the largest value that fits in a uint? If we were talking about uint32 rather than uint, we could write

const MaxUint32 = 1<<32 - 1

but we want uint, not uint32. The int and uint types have equal unspecified numbers of bits, either 32 or 64. Since the number of bits available depends on the architecture, we can't just write down a single value.

Fans of two's-complement arithmetic, which Go's integers are defined to use, know that the representation of -1 has all its bits set to 1, so the bit pattern of -1 is internally the same as that of the largest unsigned integer. We therefore might think we could write

// +build OMIT

package main

func main() {
    const MaxUint uint = -1 // Error: negative value

but that is illegal because -1 cannot be represented by an unsigned variable; -1 is not in the range of unsigned values. A conversion won't help either, for the same reason:

// +build OMIT

package main

func main() {
    const MaxUint uint = uint(-1) // Error: negative value

Even though at run-time a value of -1 can be converted to an unsigned integer, the rules for constant conversions forbid this kind of coercion at compile time. That is to say, this works:

// +build OMIT

package main

func main() {
    var u uint
    var v = -1
    u = uint(v)
	_ = u

but only because v is a variable; if we made v a constant, even an untyped constant, we'd be back in forbidden territory:

// +build OMIT

package main

func main() {
    var u uint
    const v = -1
    u = uint(v) // Error: negative value
	_ = u

We return to our previous approach, but instead of -1 we try ^0, the bitwise negation of an arbitrary number of zero bits. But that fails too, for a similar reason: In the space of numeric values, ^0 represents an infinite number of ones, so we lose information if we assign that to any fixed-size integer:

// +build OMIT

package main

func main() {
    const MaxUint uint = ^0 // Error: overflow

How then do we represent the largest unsigned integer as a constant?

The key is to constrain the operation to the number of bits in a uint and avoiding values, such as negative numbers, that are not representable in a uint. The simplest uint value is the typed constant uint(0). If uints have 32 or 64 bits, uint(0) has 32 or 64 zero bits accordingly. If we invert each of those bits, we'll get the correct number of one bits, which is the largest uint value.

Therefore we don't flip the bits of the untyped constant 0, we flip the bits of the typed constant uint(0). Here, then, is our constant:

// +build OMIT

package main

import "fmt"

func main() {
    const MaxUint = ^uint(0)
    fmt.Printf("%x\n", MaxUint)

Whatever the number of bits it takes to represent a uint in the current execution environment (on the playground, it's 32), this constant correctly represents the largest value a variable of type uint can hold.

If you understand the analysis that got us to this result, you understand all the important points about constants in Go.


The concept of untyped constants in Go means that all the numeric constants, whether integer, floating-point, complex, or even character values, live in a kind of unified space. It's when we bring them to the computational world of variables, assignments, and operations that the actual types matter. But as long as we stay in the world of numeric constants, we can mix and match values as we like. All these constants have numeric value 1:

'b' - 'a'

Therefore, although they have different implicit default types, written as untyped constants they can be assigned to a variable of any integer type:

// +build OMIT

package main

import "fmt"

func main() {
    var f float32 = 1
    var i int = 1.000
    var u uint32 = 1e3 - 99.0*10.0 - 9
    var c float64 = '\x01'
    var p uintptr = '\u0001'
    var r complex64 = 'b' - 'a'
    var b byte = 1.0 + 3i - 3.0i

    fmt.Println(f, i, u, c, p, r, b)

The output from this snippet is: 1 1 1 1 1 (1+0i) 1.

You can even do nutty stuff like

// +build OMIT

package main

import "fmt"

func main() {
    var f = 'a' * 1.5

which yields 145.5, which is pointless except to prove a point.

But the real point of these rules is flexibility. That flexibility means that, despite the fact that in Go it is illegal in the same expression to mix floating-point and integer variables, or even int and int32 variables, it is fine to write

sqrt2 := math.Sqrt(2)


const millisecond = time.Second/1e3


bigBufferWithHeader := make([]byte, 512+1e6)

and have the results mean what you expect.

Because in Go, numeric constants work as you expect: like numbers.


Catherine Devlin: %sql: To Pandas and Back

 ∗ Planet Python

A Pandas DataFrame has a nice to_sql(table_name, sqlalchemy_engine) method that saves itself to a database.

The only trouble is that coming up with the SQLAlchemy Engine object is a little bit of a pain, and if you're using the IPython %sql magic, your %sql session already has an SQLAlchemy engine anyway. So I created a bogus PERSIST pseudo-SQL command that simply calls to_sql with the open database connection:

%sql PERSIST mydataframe

The result is that your data can make a very convenient round-trip from your database, to Pandas and whatever transformations you want to apply there, and back to your database:

In [1]: %load_ext sql

In [2]: %sql postgresql://@localhost/
Out[2]: u'Connected: @'

In [3]: ohio = %sql select * from cities_of_ohio;
246 rows affected.

In [4]: df = ohio.DataFrame()

In [5]: montgomery = df[df['county']=='Montgomery County']

In [6]: %sql PERSIST montgomery
Out[6]: u'Persisted montgomery'

In [7]: %sql SELECT * FROM montgomery
11 rows affected.
[(27L, u'Brookville', u'5,884', u'Montgomery County'),
(54L, u'Dayton', u'141,527', u'Montgomery County'),
(66L, u'Englewood', u'13,465', u'Montgomery County'),
(81L, u'Germantown', u'6,215', u'Montgomery County'),
(130L, u'Miamisburg', u'20,181', u'Montgomery County'),
(136L, u'Moraine', u'6,307', u'Montgomery County'),
(157L, u'Oakwood', u'9,202', u'Montgomery County'),
(180L, u'Riverside', u'25,201', u'Montgomery County'),
(210L, u'Trotwood', u'24,431', u'Montgomery County'),
(220L, u'Vandalia', u'15,246', u'Montgomery County'),
(230L, u'West Carrollton', u'13,143', u'Montgomery County')]


Brendan Scott: Permission problems with Kivy/Android

 ∗ Planet Python

Kivy is not very helpful in helping you track down permission problems.  If everything else seems working fine, make sure that you have the right Android permissions to access the relevant Android services/hardware.  This is the log file for not having the CAMERA permission: I/python  (11009):    File “[]/android_zbar_qrcode_master/.buildozer/android/app/”, line 170, in startI/python  (11009):    File “jnius_export_class.pxi”, […]



 ∗ The Go Programming Language Blog


What happens in Portland in July? OSCON! At this year's conference, Go was more present than ever before, with five talks, two workshops, a Birds of a Feather session, and a meetup.


Matt Stine talked about his experience switching from Java to Go with A recovering Java developer learns Go while Steve Francia presented Painless Data Storage with MongoDB and Go. Steve also presented Go for Object Oriented Programmers, where he explained how some object oriented concepts can be implemented in Go.

Finally, Josh Bleecher Snyder talked about his experience writing tools to work with Go source code in Gophers with hammers, and Francesc Campoy talked about all the things that could have gone wrong and what the Go team did to prevent them Inside the Go playground.


At the beginning of OSCON's workshop day, Steve Francia presented how to build a web application and a CLI tool during Getting started with Go to a big room full of Gophers.

In the afternoon, Chris McEniry gave his Quick introduction to system tools programming with Go where he went over some useful skills to write system tools using Go and its standard library.

Additional events

To take advantage of the increased Gopher population in Portland during OSCON, we organized two extra events: the first PDXGolang meetup and a Birds of a Feather session.

At the meetup Francesc Campoy talked about Go Best Practices and Kelsey Hightower gave a great introduction to Kubernetes, a container management system for clusters written in Go by Google. If you live in Portland, make sure you join the group and come along to the next meeting.

The "Birds of a Feather" (or, more aptly, "Gophers of a Feather") was a lot of fun for everyone involved. We hope to see more of you there next year.

In conclusion

Thanks to all the gophers that participated in OSCON. After the successes of this year we look forward to more Go fun at OSCON 2015.


An Event Apart: Atomic Design

 ∗ LukeW | Digital Product Design + Strategy

In his Atomic Design talk at An Event Apart in Chicago IL, Brad Frost talked about the benefits of design systems on the Web and shared some tools he's created to help people work quickly with responsive design. Here are my notes from his talk:

  • We've focused on designing Web pages for long time. The idea of the printed page doesn't make sense anymore.
  • Pixel perfection & designing the same experience for all devices is not possible. Phones, tablets, and laptops are not the same.
  • What are the building blocks of design? Things that go beyond typography and color choices. What interaction components are important?
  • There are a lot of frameworks available for responsive design but they come with issues: one-size fits all requirements, lookalike issues, potential for bloat and un-needed stuff, they might not do everything you need, you have to subscribe to someone else's code structures.
  • Responsive deliverables should look a lot like fully-functioning Twitter Bootstrap-style systems custom tailored for your clients' needs. -Dave Rupert
  • Instead of page-based designs. We need design systems. Lot of people have been exploring design systems and the processes needed to design like this.
  • Front-end style guides make things easier to test, provide you a better workflow and shared vocabulary. Examples exist from MailChimp, Starbucks, Yelp, and more.
  • More patterns create more problems. You need to dedicate people to create and manage a style library. Over time this becomes an auxiliary project which may be seen only as a designer/developer tool. These libraries are often incomplete and only serve present cases. Avoid developing a "spray of modules".

Atomic Design

  • At the end of the day, we're working with atoms -with atomic units. Atoms combine to form elements, elements create molecules, molecules create organisms, and so on. All matter is compromised of atoms.
  • Atoms on the Web are HTML elements like: labels, inputs, buttons, headers, etc. More abstract atoms include colors and fonts. These can't be broken down any further without loosing their meaning. Atoms are often not too useful on their own but they do allow you to see your global styles laid out at once.
  • The real power comes from combing atoms into molecules. An input field and button can be combined into things like a search box. Molecules are groups of atoms bonded together. They are more concrete than atoms and encourage a “do one thing and do it well” approach.
  • Organisms are groups of molecules joined together to form a distinct section. They can consist of similar and/or different molecule types.
  • Templates allow you place organisms inside of Web pages. The begin life was wireframes and increase their fidelity over time. Templates are client facing and eventually become the deliverable/production code.
  • Pages are a specific instance of a template. They are high fidelity when representational content is replaced with real content. Pages test the effectiveness of a template: how does it scale and stretch to different kinds of content.
  • Atomic design allows us to traverse from abstract to concrete. Creators can focus on the atoms and molecules and Clients can focus on pages and templates.

Pattern Lab

  • Pattern Lab is a comprehensive custom component library, a pattern starter kit, a design system builder, a practical viewport resizer, and a design annotation tool.
  • Pattern Lab is not a UI framework.
  • Pattern Lab allows you to style up individual components as you build pages quickly using includes built with Mustache (logic-less templates). The page can be assembled fast through the pre-built components (organisms) in Pattern Lab. So right away you can test the effectiveness of templates. At the same time, people can be working on refining designs and more.
  • Templates allow you to see the structure of content without filling it with real content (up front). Includes within each template allow you to stitch components tougher quickly and see how things fit together. A local JSON file can be used to fill in these includes with real data/content.
  • Pattern Lab includes Ish which is a viewport resizer that gives you a different sized screen in small, medium, large sizes so you don't get stuck on specific device sizes like other tools suggest (480, 768, 1024, etc.).
  • Annotations allow you to make specific notes on interface components. This explains some of the design decisions made in an interface.
  • Lineage gives you a list of where components are used in your site.
  • Why use Pattern Lab? It fills the post-PSD void and serves as a hub for the entire design process for everyone: information architects, designers, developers, and clients. It allows you to easily traverse from abstract to concrete. You can write whatever HTML/CSS/JS as you please. Pattern Lab encourages flexibility and document as you go.


  • What's the hardest part of responsive Web design? Most people say people and process.
  • Set the right expectations. Our reality still consists of design review meetings where people look over static mock-ups. But we can't sell Web sites like paintings. They are living things that fill a variety of containers.
  • The waterfall process doesn't allow designers and developers to collaborate closely. This model is broken, we need to work together along the way.
  • Gather information through interviews, analytics, style inventories, and more to collect everything you need.
  • Do an interface inventory: document your interface, promote consistency, establish scope, and lay the groundwork for a future style guide or pattern library.
  • Establish direction: define site-wide patterns and start pulling these components into a Pattern Lab environment. This allows everyone to keep working on their various components: type, IA, colors, etc.
  • Typecast is a tool that allows you to try out a number of different typefaces on common interface elements. This helps isolate decisions.
  • This kind of iterative approach lets you keep iterating and making decisions as you go. Design, build, test, repeat. When you're finished changing, you're finished.
  • Collaboration and communication trumps deliverables.