Brett Cannon: Security of cash vs. bank accounts vs. Bitcoin

 ∗ Planet Python

A co-worker recently gave a talk that explained how Bitcoin worked and it was interesting to hear how you should protect your bitcoins. When you think of cash, it’s basically some physical good which you own while you have it in your possession, and lose when you don’t have it. It’s very straight-forward and easy to comprehend. The biggest downside is there is no backup if you screw up, e.g. if you leave a $20 bill lying around and someone takes it there is no way to get it back. But this does mean you have complete control over money.

With a bank account, you have certain guarantees to protect your money. If someone steals your bank card information you have federal guarantees to get a refund for the fraudulent charges made against your bank account. This does mean, though, that you have to entrust your money to the bank and that they won’t go under, lose your money, etc. While the federal government makes certain guarantees about losing money from a bank, this assumes you follow the requirements and it isn’t some crazy systemic failure (e.g. the bank balance doesn’t go over a certain amount and your government isn’t collapsing).

Bitcoin is a lot like cash. What you do is make a public key and private key for your bitcoins. The public key is like an account number to send money to. The private key is like a password to access the bitcoins sent to the public key. If someone gets a hold of your private key they can easily transfer your bitcoins to another public key. And since anyone can generate a public key and not share who controls the keys it essentially becomes robbery by anonymous robbers. There is also no governmental backup in the case of theft like with your bank account. So just like cash, the security of your bitcoins are entirely up to you.

Unlike cash, though, you can keep your keys on your computer which exposes you to more potential theft than cash which is entirely offline. This is why the Bitcoin community has two pieces of advice to help mitigate the loss of bitcoins from someone breaking into the computer storing your private keys. First is to constantly be moving your bitcoins to different keys and to not keep all of your bitcoins in a single private key. The idea is that when you use the funds in a private key you transfer all of the funds out: transfer to the public key that instigated the transfer and then the remaining balance to another public key(s) you control.

The second piece of advice is that for long-term storage you should keep your private key entirely offline. In this scenario the idea is that if your private key never goes near the internet there is no chance that digital thieves can steal it (thus it ends up just like cash and only susceptible to physical theft). This becomes an interesting challenge in varying levels of paranoia to generate such a private key in such a way that it won’t be accessible ever to the internet until such a time that you want to withdraw from private key. Probably the simplest approach is to visit something like https://bitcoinpaperwallet.com in an incognito window in Chrome, take your computer offline, use the website to create a private/public key pair, print the keys using a printer that has no internet connection or buffer that saves what it prints, print the private key, quit Chrome, and then give your computer an internet connection again. This is probably good enough for most people.

But what if you want a proper guarantee that there is no chance your private key will ever touch the internet? For that you will want to take a Raspberry Pi which has only a physical connection to the internet, update its software with everything you need to print along with what I mentioned previously, disconnect your Raspberry Pi from the internet, generate the private key, print it, turn off your Raspberry Pi, and then destroy the SD card which you used to run your Pi. By never connecting your Pi to the internet once you begin the process of generating your private key you know it won’t leak online since there is no WiFi that might accidentally be turned on without you knowing. And by destroying the SD card immediately after you are done you guarantee that you will never accidentally have the private key make it on to the internet by reading the SD card on a computer with an internet connection. With that you can then do stuff like use tamper-resistant stickers to make sure no one snuck a peak at your private key and make copies that you distribute to trusted friends to protect against accidental destruction in e.g. a fire. This ability to have a single piece of paper represent any amount of money and to be able to physically make copies for safe keeping is what differentiates bitcoins from cash.

Easy Color Contrast Testing

 ∗ A List Apart: The Full Feed

We have plenty of considerations to design for when crafting websites. Web accessibility is not a new design consideration, but is still very important, no matter the size or speed of device we’re testing on. The Web Content Accessibility Guidelines (WCAG) tells us our content should be distinguishable and requires we “[m]ake it easier for users to see and hear content including separating foreground from background.”

We know that our color contrast ratio should be 3:1 for non-decorative text, sized larger than 18 point or larger than 14 point if bold. Text smaller than that should meet a contrast ratio of at least 4.5:1.

Maybe you have amazing eyeballs that can help you recognize contrast levels. If, like me, you do not have magical corneal calculators, then you probably have utilized one of the tools out there to check contrast, such as: WebAIM’s color contrast checker, Snook’s contrast slider, Check my colors URL input check, or a WCAG checker add-on for Firefox.

I recently switched to using Chrome’s Accessibility Developer Tools built in contrast checker and I love it. Take a look at the audits being run by the tools and let’s look at how to begin using it once installed.

Animation showing a progression through step one

Load up the website you’d like to check and bring up the Developer Tools. I’ll pick on myself and use my own site for this example. Once open, click over to the “Audits” tab and make sure “Accessibility” is checked. Click “Run.”

Animation showing a progression through step two

Expand the “Text elements should have a reasonable contrast ratio” section. This will show you the HTML of the elements that don’t have sufficient contrast. Identify one to examine further.

Animation showing a progression through step three

Select the chosen offender in the browser and inspect it. If you can’t see the contrast values, use the menu to pull up the “Accessibility Properties.” You’ll see the current contrast ratio of your element. You’ll also see a suggested color value pair to match the WCAG AA or AAA recommendation. Select the swatch to the right of those values to see the preview of that change. In this case, we’ll see what grey we’d have to adjust our background to in order to keep the white text.

Animation showing a progression through step four

As you can see in this second example, I could make the date text darker to meet the guidelines, which is very helpful in making a fast change.

When it’s this quick and simple to check contrast, there’s no reason not to add this accessibility test into our workflow.

Network Performance Testing

 ∗ A List Apart: The Full Feed

It’s extremely likely that sometime in 2014, the number of internet users will pass 3 billion. Not surprisingly, the largest areas of growth are developing markets—predominantly Africa and the Asia-Pacific region. These markets are being flooded with mobile devices small and large, fast and slow, smart or otherwise.

Connectivity in these regions is of great interest to large tech companies scrambling for control. Today, however, bandwidth is limited, reliability is questionable, and data plans are small. Even in markets saturated with mobile usage, like the US and much of Europe, connections are often flaky and unreliable.

For all those reasons and more, now is the time to test what you build in sub-optimal situations. Thankfully, there are a handful of tools that can help you do just that from the comfort of your high-bandwidth connection and favorite chair, rather than trekking out to a remote field with a Faraday cage.

Slow your roll

If you’re using Grunt or Node.js, there’s a fantastic plugin and module, respectively, that can slow your local server’s connection down to a configurable speed. It’s a great start to network performance testing, but it’s fairly one-dimensional.

Charles is a more robust throttler exposing a lot more control. In addition to amazing tools allowing complete insight to all network requests, Charles can throttle your entire connection, so when enabled, all traffic in and out of your machine is affected. Throttling isn’t the only factor of network performance, however. Latency is a major contributor, and Charles provides control over that aspect, as well.

Unfortunately, these tools don’t expose control over the final, and potentially most important aspect of network performance—packet loss. It has always been the toughest aspect to simulate, but if you’re a Mac and/or iOS user, you have access to the Network Link Conditioner. With control over upstream and downstream transfer speeds, latency, packet loss, and even DNS delay, Network Link Conditioner is a super-powered system-level tool that will fundamentally change the way you build and test things.

Apple provides the Network Link Conditioner through their developer platform, and luckily, it’s accessible through the free developer program, so you don’t have to pay to use it.

The Network Link Conditioner comes with some built-in presets to match common connections, such as EDGE, 3G, and DSL. You can even create and save your own presets, allowing you to easily switch between connection levels for fast testing.

All of these tools open up a new realm of testing and optimization available to us, and as the world changes, network performance testing becomes more and more important. Have you used any other tools or techniques for testing? Share them below in the comments!

Save Your Eyes with f.lux

 ∗ A List Apart: The Full Feed

I never thought I felt eye strain from looking at big, bright screens all day—I thought my young eyes were invincible. Then I started getting sharp headaches at the end of every day, and I realized I needed to change something.

I decided to finally take the jump and start using f.lux. f.lux is an app that changes the color temperature of your display, adapting the light you see to the time of day, which helps to reduce eye strain. There’s a new beta out for Mac that brings some really fantastic improvements and enhancements (don’t worry, there’s a Windows version too!).

In the morning and afternoon, you’ll see the blue-ish colored light that your screen normally pushes out. As the sun sets, the light will shift to a more reddish color, and when night falls, it’ll become an even deeper red. Every color step is customizable, so you decide how red-shifted you’d like each phase to be—I like mine on the deeper end of the scale.

It’s normal to see blue light during the day, but as it gets darker, that light is harsh on our eyes. Red light is easier on your eyes, especially at night—it’s why red lights are used to preserve vision at night.

When I tell people in our industry about f.lux, I often hear something like, "But what if I’m doing color-sensitive work?" The newest f.lux beta has a feature that allows you to disable f.lux in certain applications. As you switch into an application where you’ve disabled f.lux, your screen will slowly transition to normal colors. The smooth transition will help prepare your eyes for the blue wave of light you’re about to get hit with, so it’s not too jarring.

For anyone who spends hours a day looking at a screen, f.lux is a must-have. We spend a lot of time and effort making sure we use ergonomically correct keyboards, chairs, and desks, so it’s time we gave our eyes a similar level of treatment.

Brett Cannon: Should my family get tablets or laptops?

 ∗ Planet Python

With the assumption that everyone in my family has a smartphone of some kind, the question becomes whether members of my family should buy a laptop or a tablet as their primary computing device while at home. I think my general answer is choose one or the other depending on whether you need a keyboard regularly, but you will only want either a laptop or tablet and not both.

I did a blog post once about why I thought the tablet craze has died down. I basically said that the differentiator between a tablet and a small laptop like a Chromebook was the lack of a keyboard. If you type a lot then the lack of a keyboard on a laptop can be a hindrance. While you can get keyboards for tablets, they typically are structured such that you must have a flat surface to place the tablet and keyboard on, unlike a laptop which will work as-is in your lap.

I have continued to agree with this assessment. The point of this blog post, though, is to say a good-sized tablet – that will depend on you, so try to play with various tablet sizes to see which ones seem reasonable – can replace a laptop if you don’t need a keyboard regularly. If you write emails on a regular basis, then get a laptop. But if you can live without a keyboard and you still have access to a laptop for those times when you need one then a tablet can work. It’s worked for my father-in-law quite well and I don’t see why it couldn’t work for other family members. And if you so choose to buy a tablet, the recommendations I made in my mobile phone post hold for which tablet to buy.

Yasoob Khalid: Misconceptions about Skype local database

 ∗ Planet Python

Hi there guys. Recently I wrote an article with the name of “I didn’t know Skype stores your data in a local database without a password!. After publishing that article I got a lot of response from people like you and I came to know that it is not a vulnerability. It is so because the database is stored in the “appdata” directory which can only be accessed by the administrator which means that only an administrator account can open it. If you want someone else to use your computer just make a guest account which will restrict their level of access to the main directories only (this excludes the appdata directory). If you want to see your Skype logs then simply log in to your Skype account rather than going the complex way of accessing the local database.

However the tool (SkypeFreak) which I posted about in the previous post can be used as a post reconnaissance tool which means that if you hack into a computer then you can use the tool to access the Skype data without knowing the password.

At last I would like to apologize all of you about any misconceptions which my previous post might have made in your mind. You can safely discard those misconceptions as my mistake.

source: Previous post


Eko S. Wibowo: Reviving Microsoft Agent using PyWin32

 ∗ Planet Python

 

I just found an MS Agent characters I haven't seen before, and though to share how to control them from within Python

I just found an MS Agent characters I haven't seen before, and though to share how to control them from within Python

I think this article was driven by the fact that this April, Microsoft discontinued Windows XP. And why is that related? Because if you remember Windows XP, somehow I think you will remember --maybe in annoyance-- this cute little dog called Rover from Windows XP Find Files bar:

Howdy, Rover?

Howdy, Rover? Not even you, now your master was also dead. So sad.. Frown

Long story short, Microsoft did not gain success with this MS Agent stuff. Without even realizing it, one of the first thing I do when re-installing Windows XP is ... get rid of Rover from Windows Explorer! And yeah, unfortunately, I am not the only one.

But still, in this article we are going to use it as a study case on how to properly use PyWin32, an amazing Python package from Mark Hammond, that let your Python application integrate tightly with Windows. For example, do you know that Dropbox client application was built with Python? Laughing

Great, lets dig more on this matter!

Three Different Routes for Windows Integration

There are three different routes you can take to integrate Python application seamlessly to Windows environment, which are:

  1. Using PyWin32 package from Mark Hammond.
    For casual application, this well-maintained and well-supported package is your first bet for a successful Windows integration. Download its latest build here (build 218 as the time of this writing), and chose the correct version for your Python distribution. When I say casual, it means that your application is satisfied with the ability to call Win32 API and automate IDispatch based COM Objects (COM/Active-X Objects that can be automated using scripting environment such as JavaScript/VBScript). A clear example would be this blog, that shows you how to use Python in Excel.
    PS: I love seeing people use Python in many diverse computing environment like that!
  2. Using comtypes package from Thomas Heller
    When you want to automate particular COM Objects that are derived from IUnknown instead of IDispatch, this will be the time for you to use comtypes. It really is amazing knowing what you can do with comtypes. For example, do you know that DropBox client application is a Python application? This article from DropBox Tech Blog may shed some light for you! Or, this codeproject article that demonstrate how to work with custom COM interface, will help you in your next step in Windows integration.
  3. Using ctypes package, also authored by Thomas Heller
    When your Windows integration necessity are beyond custom COM interface, in which you need to call specific DLLs either as part of Windows distribution (such as those reside in kernel32.dll) or wrap your own C function, you will going to need ctypes.

Quick note regarding COM/Active-X: Although it's not dead, COM (Component Object Model) is technically superseded by .NET Framework. In its essence, COM is a software component architecture that let diverse application reuse components from each others to build a working application. As you know that Windows exposed its functionality through Windows API (e.g creating system tray application using Shell_NotifyIcon function), it also exposed other parts of its system using COM object (e.g. INetCfg is an IUnknown based COM Object that let you configure Windows networking configuration). Later on .NET Framework was introduced and at the current moment a new API is come to live: Windows RT. I love hearing those jargon! Laughing

In this article, we are going to look on how to create a better Python application that integrate well in Windows, using PyWin32 packages, specifically its win32com package to automate Microsoft Agent component. We are going to make a talking application that will obediently sit low and still in System Tray, and will speak up the time every each hour. Lets get down to it!

Installing Microsoft Agent

I you are still using Windows XP (woops), then nothing else you should do. MS Agents was right there alright. But if you happen to use Windows 7, go ahead to this page from Microsoft and install a hot-fix to bring back MS Agent. Users who use Windows 8/8.1 may find this support page or this page useful. Another interesting collection of MS Agent characters can be found here.

Through out the rest of the article, I am using James characters as depicted in the above figure. You may want to use another characters of your liking. Be sure to check their supported animation though! For example, James supported animation can be found here

PS: Just found out this open source replacement for MS Agent also support Windows 7 (not sure about Windows 8 though). You may try it, although currently I am using that MS Agent Hot-Fix from Microsoft.

It's alive!

The next question is, "This MS Agent things is a COM object, right? But how do I use it in my Python application?". Thanks to great work by Mark Hammond, we got this awesome PyWin32 package that let us use native Windows API and COM services from within Python environment. Download an install an appropriate distribution for your Python environment from this Sourceforge page.

Let's test our newly installed James agents, and see if we truly bring it to life:

1
2
3
4
5
6
import win32com.client
ag=win32com.client.Dispatch('Agent.Control')
ag.Connected=1
ag.Characters.Load('James')
ag.Characters('James').Show()
ag.Characters('James').Speak('Hi there Pythonist. I see that you brought me back to this world. Thank you!')

A warning though: don't try to save the above code into a *.py file and run it with either python command line or your favorit e IDE. Why? It's because the main thread that start your application will exit immediately and the character won't have time to show itself. Paste the above code into an interactive python REPL session or even ipython if you like, and you'll see a character pop up and speak using a digital speech synthesizer. Pretty cool for a thrown away product, right?

Make it always run: stick it in a System Tray application

This is another example of how awesome PyWin32 package is: create a system tray icon for Python application. As the above code need a main thread that will keep the character showing, we are going to create a Python application that put an icon in the system tray. For the System Tray functionality, I am going to use Simon Brunning's SysTrayIcon class, which is a rip off from Mark Hammond's win32gui_taskbar.py and win32gui_menu.py demos from PyWin32. In this article, I use it without changes. So, credit goes to those guys..

Below is a new Python class that wrapped a single MS Agent characters of your liking. Observe that the code is pretty much straightforward to understand.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
__author__ = 'Eko Wibowo'
 
import systray
import win32com.client
import datetime
import threading
 
class MsAgent(object):
    '''
    Construct an MS Agent object, display it and let it says the time exactly every hour
    '''
    def __init__(self):
        self.agent=win32com.client.Dispatch('Agent.Control')
        self.charId = 'James'
        self.agent.Connected=1
        self.agent.Characters.Load(self.charId)
 
    def say_the_time(self, sysTrayIcon):
        '''
        Speak up the time!
        '''
        now = datetime.datetime.now()
        str_now = '%s:%s:%s' % (now.hour, now.minute, now.second)
        self.agent.Characters(self.charId).Show()
        self.agent.Characters(self.charId).Speak('The time is %s' % str_now)
        self.agent.Characters(self.charId).Hide()
 
    def bye(self, sysTrayIcon):
        '''
        Unload msagent object from memory
        '''
 
        self.agent.Characters.Unload(self.charId)
        self.thread.cancel()
 
if __name__ == '__main__':
    import itertools, glob
 
    icons = itertools.cycle(glob.glob('*.ico'))
    hover_text = "What can I do for you Sir?"
 
    agent = MsAgent()
    menu_options = (('Say the time', icons.next(), agent.say_the_time),)
 
    systray.SysTrayIcon(icons.next(), hover_text, menu_options, on_quit=agent.bye, default_menu_index=1)

Put an *.ico file in the same folder as the application (of course, together with systray.py module). Run it, and you will a see a new icon in the Windows System tray. Right click it, and choose "Say the time". James will obediently follow your command :) 

How to Run This Application?

I saved the main application in a file named oclockreminder.pyw. By using this extension, if this file was double clicked, it will be executed by pythonw.exe, making a non-console application (it's similar to javaw.exe). You can later create a shortcut for this file in Windows Start Menu, and  having it run automatically. Actually, the best way would be to prepare an *.exe installer for this application. We are going to explore this option later on this blog.

Conclusion

Realizing that Python can integrate well in a particular Operating System, bring a window to a whole new level of possibilities. The topic discuss in this article still only touch the surface of what you can do with Python. But it I hope it gives enough foundation to get you started. 

Download the application source directly from this link, or browse through Pythonthusiast public dropbox folder here.

Or, follow its Github Repository: pythonthusiast/oclockreminder.

Stay tuned for my next article!

Hot Links!

 ∗ Jeffrey Zeldman Presents The Daily Report: Web Design News & Insights Since 1995

Kristina Halvorson at An Event Apart

AS AN EVENT APART Seattle closes, and we prepare for a sold-out Boston show, we want to share some of the helpful reviews, summaries, notes, and web links that An Event Apart Seattle inspired. Please enjoy: Hot Links From An Event Apart Seattle.

The Practice

 ∗ Jeffrey Zeldman Presents The Daily Report: Web Design News & Insights Since 1995

Typekit Practice is a fine new typography resource. Congrats & thanks @nicewebtype @typekit ! http://t.co/LncTpApUku pic.twitter.com/AVntKX6Ntn

— Jeffrey Zeldman (@zeldman) April 18, 2014

Invent with Python: Decimal, Binary, and Hexadecimal Odometers

 ∗ Planet Python

It can be difficult to see how other number systems (such as binary and hexadecimal) work since they have a different amount of numerals than the ten numerals of decimal. But imagine that you are counting in these number systems using an old-fashioned analog odometer that has a different amount of numerals for each digit.

The following three odometers always show the same number, but they are written out differently in different number systems:

Decimal (Normal, base-10 with digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9):

Binary (base-2 with digits 0, 1):

Hexadecimal (base-16 with digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F):

UPDATE: Source code for Gavin Brock’s JavaScript odometers. Source code for this binary/decimal/hexadecimal demo all on a single page.


Jaime Buelta: Compendium of Wondrous Links vol V

 ∗ Planet Python

Seven habits of effective text editing. A great essay by Bram Moolenaar (of Vim fame). It is applicable to any editor, but, of course, shows why Vim can be such a good choice (once you know how to use it, obviously) A useful collection of recipes in Python. Thirty python language features and tricks you […]

Bruno Rocha: Using Flask Cache

 ∗ Planet Python

As a micro framework Flask does not have built-in cache functionality, however, there is werkzeug cache API and an excellent extension to provide its caching functionality to your Flask apps, that extension was created by @thadeusb and is very easy to implement and use.

installing

In your env install it via PyPI (recommended)

pip install Flask-Cache  

You can also install it directly from source code if you need recent changes or bugfixes

pip install https://github.com/thadeusb/flask-cache/tarball/master

Configuring

There is a set of configuration keys you can put in your app settings, but the most important is the cache backend defined by the CACHE_TYPE key.

The cache type resolves to an import string which needs to be an object implementing the werkzeug cache api, but there is some aliases to the werkzeug.contrib.cache implementations

By default the CACHE_TYPE is Null which means that your app will have no cache, so you need to choose of the options below:


Full options and config variables are in http://pythonhosted.org/Flask-Cache/#configuring-flask-cache

A Simple Flask app

a file named app.py

import time
from flask import Flask

app = Flask(__name__)

@app.route("/")
def view():
    return time.ctime()

if __name__ == "__main__":
    app.run(port=5000, debug=True, host='0.0.0.0')

Run the above with python app.py and open http://localhost:5000 in your browser and hit F5 (refresh) to see the current date and time.

Enabling Flask-Cache for views

Now we want to enable caching on that small application to avoid the refresh of the current time (for this example we are using current time as return but imagine that it could be a large dataset or huge calculations)

a file named cached_app.py

import time
from flask import Flask

# import the flask extension
from flask.ext.cache import Cache   

app = Flask(__name__)

# define the cache config keys, remember that it can be done in a settings file
app.config['CACHE_TYPE'] = 'simple'

# register the cache instance and binds it on to your app 
app.cache = Cache(app)   

@app.route("/")
@app.cache.cached(timeout=300)  # cache this view for 5 minutes
def cached_view():
    return time.ctime()

if __name__ == "__main__":
    app.run(port=5000, debug=True, host='0.0.0.0')

Run the above with python cached_app.py and open http://localhost:5000 in your browser and hit F5 (refresh) to see that the current date and time is now cached for 5 minutes.

@cache.cached decorator takes the request.path for that view and use this as cache key, if for any reason you need a different key you can pass a key_prefix argument to the decorator. In this case if you pass a key_prefix containing the %s placeholder it will be replaced by the current request.path

The above is the simplest and regular example of Flask app and the use of cache, but, if your app is designed using application factories, blueprints, class based views or views located in different modules you will need to use advanced approach.

Caching regular functions

The same cached decorator can be used to cache regular functions, but in this case you will need to specify the key_prefix argument, otherwise it will use the request.path which can lead to conflicts if you have many cached functions.

For this example we are going to use the this module and extract a random quote from the Zen of Python.

A file named cached_function_app.py

import time
import random

from this import s, d
from string import translate, maketrans

from flask.ext.cache import Cache
from flask import Flask

app = Flask(__name__)
app.config['CACHE_TYPE'] = 'simple'
app.cache = Cache(app)

@app.cache.cached(timeout=10, key_prefix="current_time")
def get_current_time():
    return time.ctime()

def random_zen_quote():
    """Pick a random quote from the Zen of Python""" 
    transtable = maketrans("".join(d.keys()), "".join(d.values()))
    return random.choice(translate(s, transtable).split("\n")[2:])

@app.route("/")
def zen():
    return """
    <ul>
        <li><strong>It is cached:</strong> {cached}</li>
        <li><strong>It is not cached:</strong> {not_cached}</li>
    </ul>
    """.format(
        cached=get_current_time(),
        not_cached=random_zen_quote()
    )

if __name__ == "__main__":
    app.run(debug=True, port=5000, host='0.0.0.0')

Now running python cached_function_app.py and opening http://localhost:5000 when hitting F5 to refresh you will see the current time cached for 5 minutes and the random quote updated, you can switch the cache just to see the efect.

def get_current_time():
    return time.ctime()

@app.cache.cached(timeout=10, key_prefix="zen_quote")
def random_zen_quote():
    transtable = maketrans("".join(d.keys()), "".join(d.values()))
    return random.choice(translate(s, transtable).split("\n")[2:])

@app.route("/")
def zen():
    return """
    <ul>
        <li><strong>It is not cached:</strong> {cached}</li>
        <li><strong>It is cached:</strong> {not_cached}</li>
    </ul>
    """.format(
        cached=get_current_time(),
        not_cached=random_zen_quote()
    )

NOTE: Because we are importing the this module for the example, you will see the Zen quotes in your flask terminal, but there is no problem with this.

Caching modular views

Now an example when you have your app splitted in two or more files for better organization

in a folder called app put 3 files__init__.py, app.py and views.py

app/__init__.py is an empty file

app/views.py

import time
import random
from this import s, d
from string import translate, maketrans

def get_current_time():
    return time.ctime()

def random_zen_quote():
    transtable = maketrans("".join(d.keys()), "".join(d.values()))
    return random.choice(translate(s, transtable).split("\n")[2:])

def zen_view():
    return """
    <h1>Cached for 10 seconds!</h1>
    <ul>
        <li>{time}</li>
        <li>{quote}</li>
    </ul>
    """.format(
        time=get_current_time(),
        quote=random_zen_quote()
    )

as you can see the above file defined view functions, as it it a separated file, to avoid circular imports we are not recommended to use @app.route neither @app.cache so this views will be app agnostic and we are going to register its url rules and caching in the main app file.

That kind of structure is needed when your app has too many views and want a better organization.

NOTE: For better organization the mostly recommended pattern is Blueprints which I will explain further.

app/app.py

Now in the main app we need to import our views, explicitly decorate for caching and also register its urls.

from flask import Flask
from flask.ext.cache import Cache
from views import zen_view

app = Flask(__name__)
app.config['CACHE_TYPE'] = 'simple'
app.cache = Cache(app)

# explicitly apply the cache in the old-style decoration way
cached_zen_view = app.cache.cached(timeout=10)(zen_view)

# explicitly register the cached view url mapping
app.add_url_rule("/", view_func=cached_zen_view)

if __name__ == "__main__":
    app.run(debug=True, port=5000, host='0.0.0.0')

NOTE: You can also separate the cache instance in a different file for lazy initialization as we are going to see in the next example

Caching Blueprint views

As mentioned before, the best pattern to follow in Flask applications is the Blueprint pattern which is a way to create separated 'meta-apps' that will be connected to your main application in the time of initialization, the problem here is that Blueprints are meant to be reusable by many different applications, so the delegation of cache control should be dynamized.

In order to avoid circular imports you will want to create your cache instance separate from your application instance (you may want to consider switching to the app factory module if you are building something more complex).

Create a folder called blueprint_app with the following structure

cached_blueprint_app/
├── app.py
├── cache.py
├── blueprints
│   ├── __init__.py
│   └── zen_blueprint.py
└── __init__.py

The cache.py

from flask.ext.cache import Cache    
cache = Cache()

we can create a dummy lazy cache instance, that will be initialized in the future when the view will be called. For that in the app we are going to reimport the same cache instance and call init_app method.

The basic blueprints/zen_blueprint.py

import time
import random
from this import s, d
from string import translate, maketrans
from flask import Blueprint
from cache import cache

zen = Blueprint('zen', __name__)

def get_current_time():
    return time.ctime()

def random_zen_quote():
    transtable = maketrans("".join(d.keys()), "".join(d.values()))
    return random.choice(translate(s, transtable).split("\n")[2:])

@zen.route("/")
@cache.cached(timeout=20)
def zen_view():
    return """
    <h1>Cached for 20 seconds!</h1>
    <ul>
        <li>{time}</li>
        <li>{quote}</li>
    </ul>
    """.format(
        time=get_current_time(),
        quote=random_zen_quote()
    )

NOTE: In a real application you will want to modularize it separating the views, helpers etc and promoting your blueprint to a Python package.

The main app.py

from flask import Flask

from blueprints.zen_blueprint import zen
from cache import cache

app = Flask(__name__)
app.config['CACHE_TYPE'] = 'simple'
cache.init_app(app)

app.register_blueprint(zen)

if __name__ == "__main__":
    app.run(debug=True, port=5000, host='0.0.0.0')

Notice that we created a dummy instance of cache in cache.py and then used that instance to decorate the blueprints views, then the cache was initialized in app.py with init_app method. That is possible because of the Flask initialization cycle and the excellent implementation in Flask-Cache extension that takes care of this case, if you plan to write yor own Flask extension take a look at the Flask-Cache source code.

Run the application by calling python cached_blueprint_app/app.py and open http://localhost:5000 to see the blueprint view cached for 20 seconds.

Caching MethodView

Lets use the same cached_blueprint_app example but turning the zen_view in to a MethodView

Change your zen_blueprint.py to:

import time
import random
from this import s, d
from string import translate, maketrans
from flask import Blueprint
from flask.views import MethodView
from cache import cache

zen = Blueprint('zen', __name__)

class ZenView(MethodView):

    @cache.cached(30)
    def get(self):
        return """
        <h1>Cached for 30 seconds!</h1>
        <ul>
            <li>{time}</li>
            <li>{quote}</li>
        </ul>
        """.format(
            time=self.get_current_time(),
            quote=self.random_zen_quote()
        )

    @staticmethod
    def get_current_time():
        return time.ctime()

    @staticmethod
    def random_zen_quote():
        transtable = maketrans("".join(d.keys()), "".join(d.values()))
        return random.choice(translate(s, transtable).split("\n")[2:])


zen.add_url_rule("/", view_func=ZenView.as_view('zen'))

Method views maps HTTP method names as GET, POST, DELETE to the view methos as get, post, delete etc, So all we needed to do is to create a method called get and decorate it with @cache.cached decorator.

NOTE: Due to the implicit self from the caller’s perspective you cannot use regular view decorators on the individual methods of the view however, Flask-Cache is one exception because its implementation allow the use of cached decorator in individual methods. Keep this in mind.

Alternativelly you may want to cache all the methods in a view, for that you can cache the dispatch_request method or even better you can decorate the whole view.

Caching the dispatcher
class ZenView(MethodView):
    @cache.cached(timeout=30)
    def dispatch_request(self):
        return super(ZenView, self).dispatch_request()

    ...
Caching the whole view (recommended)
zen = Blueprint('zen', __name__)

class ZenView(MethodView):
    ...

cached_zen_view = cache.cached(timeout=50)(ZenView.as_view('zen'))
zen.add_url_rule("/", view_func=cached_zen_view)

Caching template blocks

Flask cache comes with a template tag able to cache template blocks, lets change our ZenView to use a Jinja2 template

in zen_blueprint.py

import time
import random
from this import s, d
from string import translate, maketrans
from flask import Blueprint, render_template
from flask.views import MethodView

zen = Blueprint('zen', __name__)

class ZenView(MethodView):

    def get(self):
        return render_template(
            'zen.html',
            get_random_quote=self.random_zen_quote
        )

    @staticmethod
    def get_current_time():
        return time.ctime()

    @staticmethod
    def random_zen_quote():
        transtable = maketrans("".join(d.keys()), "".join(d.values()))
        return random.choice(translate(s, transtable).split("\n")[2:])

zen.add_url_rule("/", view_func=ZenView.as_view('zen'))

Now we need to create a template file in cached_blueprint_app/templates/zen.html

<h3> Random Zen of Python </h3>
<strong>{{get_random_quote()}}</strong>

Running the application with python cached_blueprint_app/app.py and opening http://localhost:5000 you will see a random quote refreshed every time you push F5, lets cache it for 30 second.

Change the zen.html template

{% cache 30 %}
<h3> Random Zen of Python </h3>
<strong>{{get_random_quote()}}</strong>
{% endcache %}

Now save the file and refresh to see the content cached for 30 seconds.

Caching functions and views with variant arguments using memoize decorator

Sometimes yout views and functions receives arguments which can come from url mapping or directly to the function call, yiou may want to cache the view or funtion and use the arguments as keys to cache its different results, Flask-Cache has a different decorator for doing that.

NOTE: With functions that do not receive arguments, cached() and memoize() are effectively the same.

Now with a simple application memoize_app.py

import time
from flask.ext.cache import Cache
from flask import Flask

app = Flask(__name__)
app.config['CACHE_TYPE'] = 'simple'
app.cache = Cache(app)

@app.cache.memoize(timeout=5)
def get_current_time_and_name(name):
    return "%s - %s" % (name, time.ctime())

@app.route("/<name>")
def view(name):
    return get_current_time_and_name(name)

if __name__ == "__main__":
    app.run(debug=True, port=5000, host='0.0.0.0')

Now run python memoize_app.py and open http://localhost:5000/yourname and note that the function will be cached for each different name you pass as argument in the url.

Caching arbitrary objects

There are some times when decorators cannot be used and you need to explicitly set or get some thing on the cache.

Inside a view or a blueprint you can use current_app

from flask import current_app

def some_function():
    cached = current_app.cache.get('a_key')
    if cached:
        return cached
    result = do_some_stuff()
    current_app.cache.set('a_key', result, timeout=300)
    return result

Or if using a separete cache instance you can do this directly

from cache import cache

def function():
    cached = cache.get('a_key')
    if cached:
        return cached
    result = do_some_stuff()
    cache.set('a_key', result, timeout=300)
    return result

Clearing the cache

You can create a script to clear the cache, or a function to use it when needed

from flask.ext.cache import Cache    
from yourapp import app
cache = Cache()

def main():
    cache.init_app(app)

    with app.app_context():
        cache.clear()

if __name__ == '__main__':
    main()

WARNING: Some backend implementation do not support completely clearing the case. Also, if you’re not using key prefix, some implementation (e.g. Redis) will flush the whole database. Make sure you’re not storing any other data in your caching database.

There is a lot of examples and well documented API in flask-Cache website http://pythonhosted.org/Flask-Cache/ you can also create your own cache backend following the examples in the Flask-Cache docs.

Peter Bengtsson: Grymt - because I didn't invent Grunt here

 ∗ Planet Python

grymt is a python tool that takes a directory full of .html, .css and .js and prepares the html for optimial production use.

For a teaser:

  1. Look at the "input"

  2. Look at the "output" (Note! You have to right-click and view source)

So why did I write my own tool and not use Grunt?!

Glad you asked! The reason is simple: I couldn't get Grunt to work.

Grunt is a framework. It's a place where you say which "recipes" to execute and how. It's effectively a common config framework. Like make.
However, I tried to set up a bunch of recipes in my Gruntfile.js and most of them worked well individually but it was a hellish nightmare to get it all to work together just the way I want it.

For example, the grunt-contrib-uglify is fine for doing the minification but it doesn't work with concatenation and it doesn't deal with taking one input file and outputting to a different file.
Basically, I spent two evenings getting things to work but I could never get exactly what I wanted. So I wrote my own and because I'm quite familiar with this kind of stuff, I did it in Python. Not because it's better than Node but just because I had it near by and was able to quicker build something.

So what sweet features do you get out of grymt?

  1. You can easily make an output file have a hash in the filename. E.g. vendor-$hash.min.js becomes vendor-64f7425.min.js and thus the filename is always unique but doesn't change in between deployments unless you change the files.

  2. It automatically notices which files already have been minified. E.g. no need to minify somelib.min.js but do minify otherlib.js.

  3. You can put $git_revision anywhere in your HTML and this gets expanded automatically. For example, view the source of buggy.peterbe.com and look at the first 20 lines.

  4. Images inside CSS get rewritten to have unique names (based on files' modified time) so they can be far-future cached aggresively too.

  5. You never have to write down any lists of file names in soome Gruntfile.js equivalent file

  6. It copies ALL files from a source directory. This is important in case you have something like this inside your javascript code: $('<img>').attr('src', 'picture.jpg') for example.

  7. You can chose to inline all the minified and concatenated CSS or javascript. Inlining CSS is neat for single page apps where you have a majority of primed cache hits. Instead of one .html and one .css you get just one .html and the amount of bytes is the same. Not having to do another HTTP request can save a lot of time on web performance.

  8. The generated (aka. "dist" directory) contains everything you need. It does not refer back to the source directory in any way. This means you can set up your apache/nginx to point directly at the root of your "dist" directory.

So what's the catch?

  1. It's not Grunt. It's not a framework. It does only what it does and if you want it to do more you have to work on grymt itself.

  2. The files you want to analyze, process and output all have to be in a sub directory.
    Look at how I've laid out the files here in this project for example. ALL files that you need is all in one sub-directory called app. So, to run grymt I simply run: grymt app.

  3. The HTML files you throw into it have to be plain HTML files. No templates for server-side code.

How do you use it?

pip install grymt

Then you need a directory it can process, e.g ./client/ (assumed to contain a .html file(s)).

grymt ./client

For more options, check out

grymt --help

What's in the future of grymt?

If people like it and want to add features, I'm more than happy to accept pull requests. Some future potential feature work:

Yasoob Khalid: I didn’t know Skype stores your data in a local database without a password!

 ∗ Planet Python

Hi guys! How are you? I hope you are doing great. Recently I came to know that Skype (video conferencing software) stores a local database with almost all information of a user who has logged on to skype from that computer. You might be thinking “So what? A lot of apps do that, right?”. Yes you are right. This is mostly done to increase speed. It’s like caching the content so that whenever you log in again to your account you don’t have to wait to see your contacts. It is fine but only to this extent.

I came to know that one can take a look at the local database and extract data from it. Is that scary for you? No? Listen this. If you have some guests at your house and someone from them is a computer freak and asks you to let him use your computer. What will you do? Definitely you will say ok.

Now comes the scary part. That freak can use a simple program called SkypeFreak to connect to the local Skype database and get the info regarding your friends, the messages you have sent, the calls you have made and their duration etc, without knowing your password! He can even know about the secret messages which you send to your girlfriend. I guess now that seems scary. Right? Lets move on and see how this SkypeFreak works.

SkypeFreak is a simple Python program written by Osanda Malith for info-sec purposes. He is a security person, not a professional programmer. I recently stumbled on his program and ended up doing a complete rewrite of the source code to make it more readable, shorter and compatible with Python 3. This program contains some carefully crafted database queries which return the data from the database. Some example queries include:

SELECT fullname, skypename, city, country,\
datetime(profile_timestamp,'unixepoch') FROM Accounts
SELECT displayname, skypename, country, city, about,\
phone_mobile,homepage, birthday , datetime(lastonline\
_timestamp,'unixepoch') FROM Contacts;

The database can be connected with our Python script using sqlite3 and then we can execute these queries. The only gotcha is that the freak has got to know you Skype username but we all know that the auto complete option in Skype client can help us get that. Lets understand the main working of this program.

In all major OS’s Skype stores the database in a known location without any encryption or password (not even a simple one). For example on windows it is stored in

<$appdata>\Skype\<skype username>\main.db

Firstly you tell SkypeFreak about the skype username of the victim. After that SkypeFreak searches the local directories for a folder with that name and finally it lays its hands on the database. Furthermore after connecting to that database SkypeFreak gives you some options like get calls data, get messages data etc. When you utilize any of these commands SkypeFreak prompts you to save this info in a separate text file. That’s it! Now you are hacked! The freak can not do much with your Skype account. He only gets the data out of it, not your password which means that you do not have to change your password.

I was myself shocked when I got to know that it’s that simple to get Skype data. Microsoft should take some steps to ensure the privacy of user and prevent this type of data falling into wrong hands. They should at least password protect the database so that it is not this much simple to access it. The password can be hard-coded into the application or anything like that. I can no longer trust Microsoft with my sensitive data. If you have any questions, comments or suggestions feel free to comment below.

Last but not the least, follow my blog in order to stay up to date with my new articles. See you later!

Source: SkypeFreak


Damián Avila: 48 themes for your IPython notebook

 ∗ Planet Python

OK, a short post to give you some material to play with over the weekend ;-).

Today, I woke up early and whereas I was drinking a mate (a native drink here in Argentina) for breakfast, I remember a tweet from Nikhil Sonnad where I was mentioned:

Read more…

Will Kahn-Greene: Django Eadred v0.3 released! Django app for generating sample data.

 ∗ Planet Python

Django Eadred gives you some scaffolding for generating sample data to make it easier for new contributors to get up and running quickly, bootstrapping required database data, and generating large amounts of random data for testing graphs and things like that.

The v0.3 release is a small one, but good:

There are no backwards-compatability problems with previous versions.

To update, do:

pip install -U eadred

Infocon: green

 ∗ SANS Internet Storm Center, InfoCON: green

Testing your website for the heartbleed vulnerability with nmap

Testing your website for the heartbleed vulnerability with nmap, (Fri, Apr 18th)

 ∗ SANS Internet Storm Center, InfoCON: green

We have received reports by many readers about buggy tools to test for the heartbleed vulnerabili ...(more)...

Europython: Announcing Emily Bache as keynote speaker

 ∗ Planet Python

We are pleased to announce Emily Bache as another EuroPython 2014 keynote speaker. Emily will talk about  ”Will I still be able to get a job in 2024 if I don’t do TDD?

Geoffrey Moores’s book “Crossing the chasm” outlines the difficulties faced by a new, disruptive technology, when adoption moves from innovators and visionaries into the mainstream. Test Driven Development is clearly a disruptive technology, that changes the way you approach software design and testing. It hasn’t yet been embraced by everyone, but is it just a matter of time? Ten years from now, will a non-TDD practicing developer experience the horror of being labelled a technology adoption ‘laggard’, and be left working exclusively on dreadfully boring legacy systems?

It could be a smart move to get down to your nearest Coding Dojo and practice TDD on some Code Katas. On the other hand, the thing with disruptive technologies is that even they can become disrupted when something better comes along. What about Property-Based Testing? Approval Testing? Outside-In Development?

In this talk, I’d like to look at the chasm-crossing potential of TDD and some related technologies. My aim is that both you and I will still be able to get a good job in 2024.

About Emily Bache

Emily Bache is a software developer and test automation specialist. Currently an employee of a Swedish company, Pagero, she works on their electronic invoicing product. Together with her team, she regularly delivers working software. Emily has previously worked as a developer in organizations as diverse as small startup and large corporation, using
Python as well as other languages such as Java, Scala and Ruby. For several years she worked as an independent consultant, facilitating many Coding Dojos and developer training events. Emily is a well-known conference speaker, and author of “The Coding Dojo Handbook”. She is originally from the U.K. but now lives in Göteborg, Sweden. 

Quintagroup: collective.contact.core

 ∗ Planet Plone

collective.contact.core is a  Plone add-on that helps to manage organizations and staff in Plone (main developers are Vincent Fretin and Cedric Messiant). This product provides directory that can contain contact information for different content types: organizations/sub-organizations, persons,and positions. Contact info option depends on for which content types you set the IContactDetails behavior so it can cover many different uses.

Easy in use:

  1. Add directory to your website and insert all the additional information required. You’ll need to specify types of positions and organizations that will be used (e.g. Faculty/Staff/Students for universities). Don’t worry about filling out the form, since it can be edited at any time later.

    collective.contact.core edit.png

  2. Create organization(s) in the directory. Depending on the hierarchy, add other organizations (they may correspond to units, divisions, departments, etc.). An organization can contain position (e.g Dean, secretary, SEO) that will be connected with person (a physical person). Choose Organization/Position from the Add new drop-down menu or click on Create contact to divaricate your directory.
  3. collective.contact.core navigation.png

A person content type can hold one or more positions or be member of one or more organizations. All contact types have optional fields with variety of contact information, including phone, cell phone, fax, email, address, zip code, etc. Such data management is very suitable for universities.

collective.contact.core.png

collective.contact.core can be useful for all kinds of organizations, despite their size, number of employees or subdivisions. Created directory is easy to manipulate and can be branched or edited at any time.

collective.contact.core adds new content types, but preserves Plone functionality, especially concerning users’ rights. Every ‘organization’ content type is similar to folder, thus you can specify in the Sharing tab what rights users have . Moreover, default Plone search is very efficient when you want to search for a specific person or position on all the website.

Use collective.contact.core to arrange your organization and contact information.

Contributors:

More information:

The Death of the Web Design Agency?

 ∗ A List Apart: The Full Feed

Others have gone as far to say that the very concept of a user experience-focused agency simply isn’t a long-term play, largely because of what the big folks are up to. Facebook and Google went on a design/product buying spree specifically because they needed to figure out how to own this thinking themselves, and other tech companies have followed. And more traditional industries, like insurance, media, and retail? They’ll develop robust in-house capabilities soon, if they haven’t already.

Ready to pack up your things and start a landscaping business? Not so fast.

Greg Hoy, Differentiate or Die?

In The Pastry Box Project today, Greg Hoy of Happy Cog talks honestly about why the first quarter of this year sucked for most web design agencies (including ours), assesses the new and growing long-term threats to the agency business model, and shares his thinking on what we in the client services design business can do to survive, and maybe even thrive.

Cennydd Bowles on UX & Design: Letter to a Junior Designer

 ∗ A List Apart: The Full Feed

I admit it: you intimidate me. Your work is vivid and imaginative, far superior to my woeful scratchings at a similar age. The things I struggle to learn barely make you sweat. One day, you’ll be a better designer than me.

But for now, I can cling to my sole advantage, the one thing that makes me more valuable: I get results. I can put a dent in cast-iron CEO arguments. I can spot risks and complications months in advance. In the wager that is design, I usually bet on the right color. People trust me with their stake.

So, if you’ll humor me, maybe I can offer a few suggestions to speed you toward the inevitable.

Slow down

You’re damn talented. But in your eagerness to prove it, you sometimes rush toward a solution. You pluck an idea from the branch and throw it onto the plate before it has time to ripen. Don’t mistake speed for precocity: the world doesn’t need wrong answers in record time.

Perhaps your teachers exalted The Idea as the gem of creative work; taught you The Idea is the hard part. I disagree. Ideas aren’t to be trusted. They need to be wrung dry, ripped apart. We have the rare luxury that our professional diligence often equates to playfulness: to do our job properly, we must disassemble our promising ideas and make them into something better.

The process feels mechanical and awkward initially. In time, the distinction between idea and iteration will blur. Eventually, the two become one.

So go deeper. Squander loose time on expanding your ideas, even if you’re sure they’re perfect or useless. Look closely at decisions you think are trivial. I guarantee you’ll find better solutions around the corner.

Think it through

We’d love to believe design speaks for itself, but a large part of the job is helping others hear its voice. Persuasive rationale—the why to your work—is what turns a great document into a great product.

If you haven’t already, sometime in your career you’ll meet an awkward sonofabitch who wants to know why every pixel is where you put it. You should be able to articulate an answer for that person—yes, for every pixel. What does this line do? Well, it defines. It distinguishes. But why here? Why that color? Why that thickness? “It looks better” won’t suffice. You’ll need a rationale that explains hierarchy, balance, gestalt—in other words, esoteric ways to say “it looks better,” but ways that reassure stakeholders that you understand the foundations of your craft. Similarly, be sure you can explain which alternatives you rejected, and why. (Working this through will also help you see if you have been diligent or if you’ve been clinging to a pet idea.) This might sound political. It is. Politics is just the complex art of navigating teams and people, and the more senior you get, the more time you’ll spend with people.

Temper your passion

Your words matter: be careful not to get carried away. Passion is useful, but you’ll be more effective when you demonstrate the evidence behind your beliefs, rather than the strength of those beliefs. Softer language earns fewer retweets but better results. If you have a hunch, call it a hunch; it shows honesty, and it leaves you headroom to be unequivocal about the things you’re sure of.

Similarly, your approach to your work will change. Right now design is an ache. You see all the brokenness in the world: stupid products, trivial mistakes, bad designs propped up with scribbled corrections. That stupidity never goes away, but in time you learn how to live with it. What matters is your ability to change things. Anyone can complain about the world, but only a good few can fix it.

That fury, that energy, fades with time, until the question becomes one of choosing which battles to arm yourself for, and which to surrender. Often this means gravitating toward the biggest problems. As you progress in the field, your attention may turn from tools and techniques to values and ethics. The history of the industry is instructive: give it proper attention. After all, all our futures shrink with time, until finally the past becomes all we have.

You’ll come to appreciate that it can be better to help others reach the right outcomes themselves than do it yourself. That, of course, is what we call leadership.

Finally, there may come a point when you realize you’re better served by thinking less about design. Work and life should always be partially separate, but there’s no doubt that the experiences you have in your life shape your work too. So please remember to be a broad, wise human being. Travel (thoughtfully) as much as you can. Read literature: a good novel will sometimes teach you more than another design book can. Remind yourself the sea exists. You’ll notice the empathy, sensitivity, cunning, and understanding you develop make your working life better too.

But you’re smart, and of course you realize this is really a letter to the younger me. And, alongside, it’s a lament at my nagging sense of obsolescence; the angst of a few grey hairs and the emerging trends I don’t quite understand. Which is mildly ridiculous at my age—but this is a mildly ridiculous industry. And you’ll inherit it all, in time. Good luck.

Yours,
Cennydd

Free Speech

 ∗ xkcd.com

I can't remember where I heard this, but someone once said that defending a position by citing free speech is sort of the ultimate concession; you're saying that the most compelling thing you can say for your position is that it's not literally illegal to express.

Alex Gaynor: Best of PyCon 2014

 ∗ Planet Python

This year was my 7th PyCon, I've been to every one since 2008. The most consistent trend in my attendance has been that over the years, I've gone to fewer and fewer talks, and spent more and more time volunteering. As a result, I can't tell you what the best talks to watch are (though I recommend watching absolutely anything that sounds interesting online). Nonetheless, I wanted to write down the two defining events at PyCon for me.

The first is the swag bag stuffing. This event occurs every year on the Thursday before the conference. Dozens of companies provide swag for PyCon to distribute to our attendees, and we need to get it into over 2,000 bags. This is one of the things that defines the Python community for me. By all rights, this should be terribly boring and monotonous work, but PyCon has turned it into an incredibly fun, and social event. Starting at 11AM, half a dozen of us unpacked box after box from our sponsors, and set the area up. At 3PM, over one hundred volunteers showed up to help us operate the human assembly line, and in less than two and a half hours, we'd filled the bags.

The second event I wanted to highlight was an open space session, on Composition. For over two hours, a few dozen people discussed the problems with inheritance, the need for explicit interface definition, what the most idiomatic ways to use decorators are, and other big picture software engineering topics. We talked about design mistakes we'd all made in our past, and discussed refactoring strategies to improve code.

These events are what make PyCon special for me: community, and technical excellence, in one place.

PS: You should totally watch my two talks. One is about pickle and the other is about performance.

ISC StormCast for Friday, April 18th 2014 http://isc.sans.edu/podcastdetail.html?id=3941, (Fri, Apr 18th)

 ∗ SANS Internet Storm Center, InfoCON: green

...(more)...

Heartbleed CRL Activity Spike Found, (Wed, Apr 16th)

 ∗ SANS Internet Storm Center, InfoCON: green

Update: CloudFlare

The Yip Sang Correspondence Project 葉生信件翻譯工程

 ∗ AuthentiCity

The Project
葉生信件翻譯工程

This project sought to make available Chinese-language documents which are held in a predominantly English-language archives. A selection of correspondence from the Yip family and Yip Sang Ltd. fonds (AM1108) was used. One of the difficulties with making these materials available is that there are so few local people who can read the old-style Chinese writing. We decided to digitize the letters so that they are available to readers of the old script throughout the world, and to invite them to contribute their translations and interpretations.

This work, completed in 2008, was done in cooperation with the Department of History at the University of British Columbia. W. Wang translated some of the letters under the supervision of Dr. Henry Yu. We are grateful for the financial assistance of the Government of Canada for the digitization of photographs and letters.

See the result of the joint digitization project with UBC Library: http://www.library.ubc.ca/chineseinbc/search.html

Search the Yip Sang materials: http://digitalcollections.library.ubc.ca/cdm4/search.php?CISOROOT=/yipsang

envelope

 

 

 

 

 

 

 

 

 

這個翻譯工作,目的在協助一個以英語為主的檔案館 ;例如温哥華檔案館; 找出最可行的方法,令公眾能夠使用館內的中文資料。工作主要是將部份葉氏家族及其公司的信件(館蔵編號: AM1108),翻譯成英文。 其中最困難的地方,是書信的手寫字體較難辨認,以及解讀信中的舊式文體。為求得到世界各地人仕的幫助, 温哥華檔案館決定將信件製成數碼影像,然後將影像透過互聯網發放到世界各地,好讓有識之士,協助完成翻譯工作。

翻譯工作在温哥華檔案館和卑詩大學歷史系合作下,於2008年完成。而份信件的翻譯是在余全毅博士的指導下,由王小姐完成。

查閱翻譯和數碼化工作的背景資料及成果,請瀏覽以下網址:
http://www.library.ubc.ca/chineseinbc/search.html

查閱葉氏家族及其公司信件的數碼檔案,請瀏覽以下網址:
http://digitalcollections.library.ubc.ca/cdm4/search.php?CISOROOT=/yipsang

Yip Sang
葉生

The Yip family in Vancouver began with Yip Sang’s arrival in B.C. in 1881. Yip Sang, whose real name was Yip Chun Tien (along with two other Chinese names, Yip Loy Yiu and Yip Lin Sang), was born in China in 1845. In 1864, he left his home village, Shengtang Cun, Duhu County in Guangdong province, to travel to San Francisco, where he worked as a dishwasher, cook, cigar maker, and labourer in the goldfields.

Eventually he left for B.C., and in 1881, after first looking for gold in the north, settled in Vancouver and found work as a pedlar, selling sacks of coal door to door. In 1882, he was employed by the Canadian Pacific Railroad Supply Company, where he worked as a bookkeeper, timekeeper, paymaster and then as the Chinese superintendent. In 1885, Yip Sang left the company and returned to China. In 1888, he returned to Vancouver and established the import and export firm of Wing Sang Company.

During his lifetime, Yip Sang had four wives and a total of twenty-three children. He became a naturalized British subject in 1891. Yip Sang was one of the driving forces in the establishment of the Chinese Benevolent Association, the Chinese School and the Chinese Hospital (now Mount St. Joseph’s) in Vancouver. He was a lifetime governor of Vancouver General Hospital, and was also a benefactor of the Public Hospital in Guangdong province in China. He died in 1927.

Yip Sang at his 80th birthday celebration October 22, 1925. Photographer Cecil B. Wand. Detail from City of Vancouver Archives CVA 749 葉生80大壽,攝於1925年10月22日。温哥華檔案館相片編號﹕CVA 749

Yip Sang at his 80th birthday celebration October 22, 1925. Photographer Cecil B. Wand. Detail from City of Vancouver Archives CVA 749
葉生80大壽,攝於1925年10月22日。温哥華檔案館相片編號﹕CVA 749

 

 

 

 

 

 

 

 

 

 

 

温哥華葉氏家族是由第一代移民葉生於1881年建立,葉生原名葉春田,又名葉來饒或葉連生。葉生在1845年出生於中國廣東省一個農村,他在1864年離鄉後,曾到美國三藩市, 當過洗碗、廚師、雪茄煙工人, 亦曾在金礦場工作。

葉生在 1881 年遷移到加拿大卑詩省, 曾在省北部淘金。 葉生定居於温哥華後, 曾做過賣煤炭小販。一年後葉生受僱於加拿大太平洋鉄路物料公司, 負責入數、 記錄工時及出納等工作, 及後葉生更獲委為華人監工。 葉生在1885年離職返回中國 , 他在1888年重返温哥華創立永生號 ,經營出入口生意。

葉生有4位妻子及23名子女,他在1891年歸化英籍加人。葉生一生致力推重社區發展,他曾參與建立温哥華中華總會、 温哥華中文學校及温哥華中醫院( 即現時的聖約瑟醫院 ),他亦是温哥華綜合醫院的終身理事。 除温哥華外,葉生也曾捐助廣東省公立醫院。葉生於1927年逝世。

The Wing Sang Company
永生號

Wing Sang Company was one of the wealthiest firms in the Chinatown area of Vancouver. It engaged in contracting Chinese workers for the Canadian Pacific Railway Company; the import and export of general merchandise from China and Japan; money remittance from Vancouver to Hong Kong; and the dry-salt herring business with China. It also functioned as a passenger agency with the Canadian Pacific Steamships Ltd. The Wing Sang Company was renamed Yip Sang Ltd. in 1950.

The Wing Sang Building, at what is now 51-69 East Pender Street (renamed and renumbered in 1907 from 29-35 Dupont Street), was built in 1889 and greatly extended in 1901, and is thought to be the oldest surviving building in Vancouver’s Chinatown.

Wing Sang building, ca. 1901-07. Photographer unknown. City of Vancouver Archives CVA 689-54 永生號大樓, 攝於約1901-1907年間。温哥華檔案館相片編號﹕CVA 689-54

Wing Sang building, ca. 1901-07. Photographer unknown.
City of Vancouver Archives CVA 689-54
永生號大樓, 攝於約1901-1907年間。温哥華檔案館相片編號﹕CVA 689-54

永生號曾經是華埠其中一間最興旺的商號,主要業務包括為加拿大太平洋鉄路公司輸入中國勞工、中國及日本的商品貿易、温哥華及香港兩地的往來滙款、以及中國的咸魚貿易,永生號亦是太平洋輪船公司的其中一個代理。永生號在1950年改名為葉生有限公司。

永生號大樓位於片打東街51至69號,大樓建於 1889年,及後在1901年作大幅擴建。永生號相信是温哥華華埠現存最古老的建築物。

Acquisition of the Materials
葉氏家族檔案的捐贈及存館過程

In June 1989, Randall Yip contacted the Archives on behalf of the Yip family regarding the Wing Sang Company building, which was still owned by the family but had not been in recent use. The building was to be renovated, and there were papers and artifacts within which might be of historical interest. Over three days that August, five staff members packed and retrieved over forty boxes of materials found in two levels of the building.

Additional materials were later donated by family members, but the majority of the fonds was salvaged from the building.

Materials after acquisition and freeze fumigation. Archives staff photo, 1991 經泠涷殺菌及除蟲處理後的部份文件,攝於1991年。

Materials after acquisition and freeze fumigation. Archives staff photo, 1991
經泠涷殺菌及除蟲處理後的部份文件,攝於1991年。

 

 

 

 

 

 

 

 

 

1989年6月, 葉氏家族的一名成員,代表家族聯絡温哥華檔案館,商討一批存放在永生號大樓內的文件及文物。當時葉氏家族仍然擁有該大樓,但大樓已空置多時。由於大樓將會 重建,葉氏家族希望該批有歷史價值的文獻,能夠得到妥善保存。 温哥華檔案館遂於同年8月在大樓展開工作,5位檔案館的職員在3日內,整理及包裝超過40多箱文件。

大樓內發現的文件,成為葉氏家族檔案的主要部份。而葉氏家族的成員,亦捐出個人珍藏的家族資料。

Opening the Safes
打開保險庫

Two safes, including a large walk-in, were opened by a professional safecracker. The walk-in safe was unlocked, but the outer set of doors was rusted shut, so the metal had to be cut away with a torch in order to gain access. The contents were protected from the shower of sparks by the inner doors.

The smaller safe was empty.

recovery-1

 

 

 

 

 

 

 

 

 

 

 

 

永生號大樓內有一個大型的保險庫,保險庫有內外两層門。 雖然保險庫沒有上鎖,但由於外門已生銹至不能開啟 ,要由技師用燒焊器,將外門燒開,才能取出保險庫內的文件。由於有保險庫的內門保護,文件未為燒焊火花所破壞。

大樓內亦有一個較小型的保險庫,但裡面空無一物。

Document Recovery
文件修復

Eleven of the boxes of documents were taken from the walk-in safe, but as many of them had been wet and moldy for a long time due to a leak in the ceiling, even after freeze-drying only four boxes could be salvaged. All materials, both wet and dry, were fumigated in a freezer to kill insects.

Rotted wooden shelf and moldy ledger books, walk-in safe, 1989. Staff photograph. 大型保險庫內,已腐爛的木書架及已發霉的帳簿,攝於1989年。

Rotted wooden shelf and moldy ledger books, walk-in safe, 1989. Staff photograph.
大型保險庫內,已腐爛的木書架及已發霉的帳簿,攝於1989年。

檔案館職員在保險庫內取出11箱文件,但由於大樓的天花板長年漏水,大部份保險庫內的文件,因長期受潮而發霉。縱使經泠涷殺菌及抽濕處理,只有其中4 箱文件能保存下來。所有保存的文件,無論乾或受潮,都要存放泠藏庫內,進行殺菌及除蟲。

The Correspondence
關於這些信件

Yip Sang acted as an unofficial postmaster for his own employees and other local Chinese workers for correspondence to and from China. Letters addressed using Chinese characters would not be delivered by the Canadian postal system. Yip Sang had the means to transport mail to China using his import/export business. In addition, his building served as a poste restante for incoming mail to be delivered to itinerant workers.

While the characters are no different than those which have been used for thousands of years, the writing style of the time employed fewer characters than are used today to express the same idea, making interpretation a challenge. In addition, many of the handwritten characters are difficult to read. We chose only legible letters for the project.

Letter #454   信件編號 454 Add. MSS 1108-454 envelope, undated Add.MSS.1108-454 信封,年代不詳

Letter #454 信件編號 454
Add. MSS 1108-454 envelope, undated
Add.MSS.1108-454 信封,年代不詳

 

 

 

 

 

 

 

 

除了是一位商人外,葉生亦充當其員工的非正式郵政局長,負責收發員工往來温哥華與中國两地的書信。原因是當時 的加拿大郵政局,看不懂信封上的中文書地址。恰巧葉生經營两地貨物的出入口生意,故此員工的家書便連同葉生的貨物,一起往來温哥華與中國,而永生號大樓亦 是員工郵件的代收及待領中心。

雖然中文字已沿用了數千年,但由於當時所用的文體跟現代的有所不同,加上當時的人比現代人用較少的字,來表達意思,令翻譯工作遇到困難。由於大部份信件的字體十分潦草,我們只揀選了一些字體容易辨認的信件,來進行翻譯。

Sample Translations
一些翻譯樣本

Letter 352 信件編號 352 Add. MSS 1108-352, undated Add.MSS.1108-352, 年代不詳 Wang Kuopang notified Wang Kuoyue that he has remitted five hundred and ninety dollars to Wing Sang Co. The letter has a literal translation of "being taxed while entering Vancouver (or Canada )." 王擴胖匯至永生寶號與王擴月五百九十六元。內有“打稅入埠”字句。

Letter 352
信件編號 352
Add. MSS 1108-352, undated
Add.MSS.1108-352, 年代不詳
Wang Kuopang notified Wang Kuoyue that he has remitted five hundred and ninety dollars to Wing Sang Co. The letter has a literal translation of “being taxed while entering Vancouver (or Canada ).”
王擴胖匯至永生寶號與王擴月五百九十六元。內有“打稅入埠”字句。

Letter 398 信件編號 398  Add. MSS 1108-398, undated Add.MSS.1108-398, 年代不詳 Cheng Wenzong thanked Mr. Ye for hiring a doctor for the wife of Jiang Boding. She has telegraphed Cheng that "she is recovering [from illness]." 陳文宗來信謝葉公代為為江伯定之妻 " 請醫 ," 其妻已來電“云好轉。”

Letter 398
信件編號 398
Add. MSS 1108-398, undated
Add.MSS.1108-398, 年代不詳
Cheng Wenzong thanked Mr. Ye for hiring a doctor for the wife of Jiang Boding. She has telegraphed Cheng that “she is recovering [from illness].”
陳文宗來信謝葉公代為為江伯定之妻 ” 請醫 ,” 其妻已來電“云好轉。”

Letter 430 信件編號 430  Add. MSS 1108-430, undated Add.MSS.1108-430, 年代不詳 Kuang Shulin informed how he spent the 1,000 dollars that Kuang Maiju sent to him in purchasing properties and taking care of underprivileged family members. 鄺樹林告知去年收到叔父寄與之一千元是如何用以置產與照顧家中弱勢者。

Letter 430
信件編號 430
Add. MSS 1108-430, undated
Add.MSS.1108-430, 年代不詳
Kuang Shulin informed how he spent the 1,000 dollars that Kuang Maiju sent to him in purchasing properties and taking care of underprivileged family members.
鄺樹林告知去年收到叔父寄與之一千元是如何用以置產與照顧家中弱勢者。

Letter #454   信件編號 454 Add. MSS 1108-454 page 1, 1913 Add.MSS.1108-454 頁1, 1913年 I. Liang Xianxi informed Liang Xianen that he has received the twenty dollars, but the medical expense for his grandson was over ten dollars. He also informed that “your parents are both over eighty years old, please return home early.” 信一:梁賢熙告知梁賢恩已收到二十元,但孫兒醫療費“即花十多元,”並叮囑“慈嚴已八十高壽,請儘早回家。”

Letter #454 信件編號 454
Add. MSS 1108-454 page 1, 1913
Add.MSS.1108-454 頁1, 1913年
I. Liang Xianxi informed Liang Xianen that he has received the twenty dollars, but the medical expense for his grandson was over ten dollars. He also informed that “your parents are both over eighty years old, please return home early.”
信一:梁賢熙告知梁賢恩已收到二十元,但孫兒醫療費“即花十多元,”並叮囑“慈嚴已八十高壽,請儘早回家。”

Letter #454   信件編號 454 Add. MSS 1108-454 page 2, undated Add.MSS.1108-454 頁2, 年代不詳 II. Liang Huanfu informed his father, Liang Xianen that "our family is fine. The annual family expenses are two hundred dollars and because there are some young ones, the estimate [expenses] is three to four hundred dollars." 信二:梁換福告知父親梁賢恩“家中安好。每年家中花費約兩百元又有幼小者,估計約需三四百元,以應家用。”。

Letter #454 信件編號 454
Add. MSS 1108-454 page 2, undated
Add.MSS.1108-454 頁2, 年代不詳
II. Liang Huanfu informed his father, Liang Xianen that “our family is fine. The annual family expenses are two hundred dollars and because there are some young ones, the estimate [expenses] is three to four hundred dollars.”
信二:梁換福告知父親梁賢恩“家中安好。每年家中花費約兩百元又有幼小者,估計約需三四百元,以應家用。”。

Letter #454   信件編號 454 Add. MSS 1108-454 page 3, undated Add.MSS.1108-454 頁3,年代不詳 III. Liang Huanfu informed his father, Liang Xianen that "he has given up his studies in order to find a job to meet the family need.” His second uncle is already eighty years old, so please mail the money back home early for buying rice and food. 信三:梁換福告知父親,梁賢恩,為應家急已“棄學圖工,”並且“二伯已八十,請早寄銀兩回家,以應米糧之需 。”

Letter #454 信件編號 454
Add. MSS 1108-454 page 3, undated
Add.MSS.1108-454 頁3,年代不詳
III. Liang Huanfu informed his father, Liang Xianen that “he has given up his studies in order to find a job to meet the family need.” His second uncle is already eighty years old, so please mail the money back home early for buying rice and food.
信三:梁換福告知父親,梁賢恩,為應家急已“棄學圖工,”並且“二伯已八十,請早寄銀兩回家,以應米糧之需 。”

 

 

 

 

 

 

 

 

Pyramid for Plone Developers: Training at Plone Symposium MW 2014

 ∗ Agendaless Blog

We are pleased to be offering a two day training session at the 2014 Plone Symposium Midwest this year. The two-day course will cover Pyramid development topics, aimed at Plone developers.

For details, please see the training page.

Looking for malicious traffic in electrical SCADA networks - part 2 - solving problems with DNP3 Secure Authentication Version 5, (Thu, Apr 17th)

 ∗ SANS Internet Storm Center, InfoCON: green

I received this week a very valuable e-mail from the DNP Technical Committee Chair, Mr. Adrew Wes ...(more)...

Agendaless: Pyramid for Plone Developers: Training at Plone Symposium MW 2014

 ∗ Planet Plone

We are pleased to be offering a two day training session at the 2014 Plone Symposium Midwest this year. The two-day course will cover Pyramid development topics, aimed at Plone developers.

For details, please see the training page.

Continuum Analytics Blog: Bokeh 0.4.4 Released!

 ∗ Planet Python

We are pleased to announce the release of version 0.4.4 of Bokeh, an interactive web plotting library for Python!

This release includes improved Matplotlib, ggplot, and Seaborn support, PyPy compatibility, continuous integration testing, downsampling of remote data, and initial work on Bokeh “apps”.

Python Sweetness: Portable 9-22x compression of animated GIFs with JPEG+Javascript

 ∗ Planet Python

Warning: not Python

A friend recently built out a site which amongst other things, in some cases features large pages of animated GIFs. There is perhaps nothing more wasteful of an Internet connection than such a page, especialy when the “animation” is actually continuous tone real colour videos converted from some other format.

[Whoops, removed utterly wrong explanation of GIF compression. GIFs aren’t run-length encoded, they use LZW coding, so the description and example that previously appeared here were completely incorrect]

This is pretty much how photos and real-world videos rich in varied tones compress, and so using GIF to encode files like these is a horrible choice.

So why is it popular, then? Well, compatibility of course. GIF has been around since at least the 90s, if not earlier, and has been supported by all browsers for over a decade.

Web Video

Unless you’ve been living under a rock, you might know that in recent years modern web browsers grew a <video> tag. Great, portable standardized containers for video!

Except it doesn’t work like that at all, because politics and money, of course. As can be seen from Video Formats and Browser Support, there is no single video codec that satisfies all popular browsers.

So unless we encode our videos at least twice (doubling at least storage costs), we can’t portably support the HTML <video> element. Even if a single encoding was supported by all modern browsers, that still leaves those less fortunate people stuck with ancient browsers out in the cold.

JPEGs

Still, each time I click one of these GIF-heavy pages and waiting 30 seconds for all 50MiB of it to load, I’m left wondering if there is a better way. And so comes a little head scratching, and an even littler proof of concept…

There is another format supported by almost every browser, one that excels at encoding continuously toned images, I am of course talking about JPEG. So how could we reuse JPEG compression to encode video files? With horrible nasty Javascript/CSS hacks, of course!

PoC

My little proof of concept doesn’t quite work well for all GIFs yet, though not surprising, since I only spent an hour or so on it. The general idea is:

* Figure out the maximum size of any GIF frame (since GIF frames may be variable)

* Politely ask ImageMagick to render each GIF frame in a tiled composition as a single new JPEG image (example source - 8.4MiB, result - 377KiB)

* Politely ask gifparse to give us the inter-frame delays, then stuff this information alongside (width, height, filename, column count) into a new JSON file (example) for JavaScript to read.

* In JavaScript, create a new <DIV> element with absolute height+width set to the animation’s size. Set the DIV’s background-image CSS property to point to the JPEG file.

* Instantiate a class that uses the information stored in the JSON file to modify the DIV’s background-image-position CSS property at timed intervals, such that all but the image for the current frame is clipped by the DIV’s dimensions

* Success! 8.4MiB GIF is now a 377KiB “animated JPEG”. You can try out a final rendering here (and full page here). Note that many of the GIFs don’t quite render properly yet, and their timing is way off, but I’m certain the output size is representative.

Note also the browser’s CPU usage. It seems at least comparable to the same page full of GIFs, which I was quite surprised by. With Firefox, when the page is running in a background tab, CPU time is minimal.

Problems?

No doubt there are issues with doing this in some browsers - for example, at the very least, the produced JPEGs are huge when they are decompressed. For our example GIF, this requires at least 40MiB RAM in the browser to decompress (and possibly 56MiB if the browser stores alpha information too)

In any case, I think there is room to improve on this technique and maybe produce something suitable for a live web site.

The original web page that caused me to think about this had 50MiB of GIF files. Recompressed, they come out as just 6.4MiB of JPEGs.

Martijn Faassen: Morepath Python 3 support

 ∗ Planet Python

Thanks to an awesome contribution by Alec Munro, Morepath, your friendly neighborhood Python micro framework with super powers, has just gained Python 3 support!

Developing something new while juggling the complexities of Python 2 and Python 3 in my head at the same time was not something I wanted to do -- I wanted to focus on my actual goals, which was to create a great web framework.

So then I had to pick one version of Python or the other. Since my direct customer use cases involves integrating it with Python 2 code, picking Python 2 was the obvious choice.

But now that Morepath has taken shape, taking on the extra complexity of supporting Python 3 is doable. The Morepath test coverage is quite comprehensive, and I had already configured tox (so I could test it with PyPy). Adding Python 3.4 meant patiently going through all the code and adjusting it, which is what Alec did. Thank you Alec, this is great!

Morepath's dependencies (such as WebOb) already had Python 3 support, so credit goes to their maintainers too (thanks Chris McDonough in particular!). This includes the Reg library, which I polyglotted to support Python 3 myself a few months ago.

All this doesn't take away from my opinion that we need to do more to support the large Python 2 application codebases. They are much harder to transition to Python 3 than well-tested libraries and frameworks, for which the path was cleared in the last 5 years or so.

[update: this is still in git; the Morepath 0.1 release is Python 2 only. But it will be included in the upcoming Morepath 0.2 release]

This week's sponsor: Harvest

 ∗ A List Apart: The Full Feed

Have you ever billed hourly? A List Apart is brought to you this week by Harvest, a beautifully crafted time tracking tool for creative shops.

Start a trial before the year slips away.

ISC StormCast for Thursday, April 17th 2014 http://isc.sans.edu/podcastdetail.html?id=3939, (Thu, Apr 17th)

 ∗ SANS Internet Storm Center, InfoCON: green

...(more)...

David "Pigeonflight" Bain: Install Plone in under 5 minutes on Codio.com

 ∗ Planet Plone

I was introduced to Codio.com by +Rok Garbas. It turns out to be a very nice platform for developing Plone projects. So far what I like is that every Codio box pretty much ships with all the Plone dependencies while at the same time having a full suite of Node based tools (important for modern Javascript development), this is a great time saver on new projects. These are still early days so I

Ian Ozsvald: 2nd Early Release of High Performance Python (we added a chapter)

 ∗ Planet Python

Here’s a quick book update – we just released a second Early Release of High Performance Python which adds a chapter on lists, tuples, dictionaries and sets. This is available to anyone who has bought it already (login into O’Reilly to get the update). Shortly we’ll follow with chapters on Matrices and the Multiprocessing module.

One bit of feedback we’ve had is that the images needed to be clearer for small-screen devices – we’ve increased the font sizes and removed the grey backgrounds, the updates will follow soon. If you’re curious about how much paper is involved in writing a book, here’s a clue:

We announce each updates along with requests for feedback via our mailing list.

I’m also planning on running some private training in London later in the year, please contact me if this is interesting? Both High Performance and Data Science are possible.

In related news – the PyDataLondon conference videos have just been released and you can see me talking on the High Performance Python landscape here.


Ian applies Data Science as an AI/Data Scientist for companies in Mor Consulting, founded the image and text annotation API Annotate.io, co-authored SocialTies, programs Python, authored The Screencasting Handbook, lives in London and is a consumer of fine coffees.

WinXP and/or Win2003 hanged systems because of SC Forefront Endpoint Protection faulty update, (Wed, Apr 16th)

 ∗ SANS Internet Storm Center, InfoCON: green

Reader Philipp reported today a bug affecting his remaining Windows XP machines and Windows 2003 ...(more)...

Alexandre Conrad: The painful process of submitting your first patch to OpenStack

 ∗ Planet Python

Recently, I built a Python tool for SurveyMonkey that hooks up our Github projects to our Jenkins instance by asking a few command-line questions (wizard) and generates the Jenkins job automatically. It literally takes less than a minute to get projects hooked up with the code building on every change / pull request, run tests, coverage, etc. and you don't even have to visit Jenkins' irritating UI.

A lot of the heavy lifting is actually done by Jenkins Job Builder (JJB), a great tool created by the OpenStack InfraTeam on which I rely on. During the development process I did small improvements to JJB and submitting a patch back to OpenStack as a way to say thank you sounded like a no-brainer. Little did I know.

The 27 steps to OpenStack contribution

If I had an OpenStack instructor, this is what I would have been told:
Protip
The following steps illustrate my process on how I eventually succeeded at submitting a patch and I'm confident this is how most wannabe contributors would do it.
  1. Fork the Github project
  2. Hack a patch and submit a pull request.
  3. See the pull request being automatically closed with the message:

    openstack-infra/jenkins-job-builder uses Gerrit for code review.

    Please visit http://wiki.openstack.org/GerritWorkflow and follow the instructions there to upload your change to Gerrit.


  4. Visit the GerritWorkflow page.
  5. Convince yourself that you don't want to read the novel entirely, CTRL+F for smart keywords.
  6. Run out of ideas, give up.
  7. Regret, grab a Red Bull, get back to it, read the novel.
  8. Create a Launchpad.net account (I had one, password recovered).
  9. Join the OpenStack foundation (wat?).
  10. Locate the free membership button and click Join Now!
  11. Skip the legal stuff and find the form fields, name, email...
  12. Wonder what "Add New Affiliation" means. Skip and submit your application.
  13. Oops, you need to add an affiliation. Add an affiliation.
  14. You also need to add an address. Address of what? Run the following Python code to find out:
    python -c 'import random
    print random.choice(["address-of-yourself", "address-of-affiliation"])'
  15. Finally submit your application form and wonder what they could possibly do with your address, it should work.
  16. Return to the GerritWorkflow page.
  17. Ah, upload your SSH key to Gerrit.
  18. pip install git-review
  19. Skip the instructions that don't apply to you but don't skip too much.
  20. Try something. Didn't work? That's because you didn't skip enough.
  21. Understand that you must run git commit --amend because you need a Change-Id line on your commit message which gets generated by git-review.
  22. Finally, run git review! (like git push, but pushes to Gerrit)
  23. Oh wait, now you have to figure out how Gerrit works. It's a code review tool which UI seems to have been inspired from the Jenkins one. Curse.
  24. <squash the numerous understanding-Gerrit-steps into one>
  25. Tweet your experience.
  26. Hope that someone sees your patch.
  27. Iterate with friendly OpenStack developers.
Actually, step 27 is a lot of fun.

Dear OpenStack

Many contributors probably just gave up and you may have lost valuable contributions. This is not a rant about Gerrit itself (ahem), I do understand that it is a code-review tool that you prefer over Github pull requests and that each tool has its learning curve. But you must smoothen the first-commit-to-Gerrit process for newcomers, one way or another. Please consider these improvements:
Please make your on-boarding pleasant, everyone will be rewarded.

Love,
Alex

Oracle Critical Patch Update for April 2014, (Wed, Apr 16th)

 ∗ SANS Internet Storm Center, InfoCON: green

Oracle released its quarterly Criticical Patch Update (CPU) yesterday [1]. As usual, the number o ...(more)...

Orbital Mechanics

 ∗ xkcd.com

To be fair, my job at NASA was working on robots and didn't actually involve any orbital mechanics. The small positive slope over that period is because it turns out that if you hang around at NASA, you get in a lot of conversations about space.

Andy Todd: Generating Reasonable Passwords with Python

 ∗ Planet Python

Thanks to a certain recent Open SSL bug there’s been a lot of attention paid to passwords in the media. I’ve been using KeePassX to manage my passwords for the last few years so it’s easy for me to find accounts that I should update. It’s also a good opportunity to use stronger passwords than ‘banana’.

My problem is that I have always resisted the generation function in KeePassX because the resulting strings are very hard to remember and transcribe. This isn’t an issue if you always use one machine but I tend to chop and change and don’t always have my password database on the machine I’m using. I usually have a copy on my phone but successfully typing ‘Gh46^f27EEGR1p{‘ is a hit and miss affair for me. So I prefer passwords that are long but easy to remember, not unlike the advice from XKCD.

Which leaves a problem. Given that I now have to change quite a lot of passwords how can I create suitably random passwords that aren’t too difficult to remember or transcribe? Quite coincidentally I read an article titled “Using Vim as a password manager”. The advice within it is quite sound and at the bottom there is a Python function to generate a password from word lists (in this case the system dictionary). This does a nice job with the caveat that it I understand from a cryptographic standpoint the passwords it creates are not that strong. But useful enough for sites which aren’t my bank or primary email. For those I’m using stupidly long values generated from KeePassX. When I tried the Python function on my machine there was one drawback, it doesn’t work in Python 3. This is because the use of ‘map’ is discouraged in Python 3. But that’s alright because I can replace it with one of my favourite Python constructs – the list comprehension. Here is an updated version of invert’s function that works in Python 3. Use at your own risk.


def get_password():
    import random
    # Make a list of all of the words in our system dictionary
    f = open('/usr/share/dict/words')
    words = [x.strip() for x in f.readlines()]
    # Pick 2 random words from the list
    password = '-'.join(random.choice(words) for i in range(2)).capitalize()
    # Remove any apostrophes
    password = password.replace("'", "")
    # Add a random number to the end of our password
    password += str(random.randint(1, 9999))
    return password

ISC StormCast for Wednesday, April 16th 2014 http://isc.sans.edu/podcastdetail.html?id=3937, (Wed, Apr 16th)

 ∗ SANS Internet Storm Center, InfoCON: green

...(more)...

New Feature: Monitoring Certification Revocation Lists https://isc.sans.edu/crls.html, (Wed, Apr 16th)

 ∗ SANS Internet Storm Center, InfoCON: green

------
Johannes B. Ullrich, Ph ...(more)...

Europython: Code of Conduct

 ∗ Planet Python

EuroPython 2014 is a community conference intended for networking and collaboration in the developer community.

We value the participation of each member of the Python community and want all attendees to have an enjoyable and fulfilling experience. Accordingly, all attendees are expected to show respect and courtesy to other attendees throughout the conference and at all conference events, whether officially sponsored by EuroPython 2014 or not.

To make clear what is expected, all delegates/attendees, speakers, exhibitors, organisers and volunteers at any EuroPython 2014 event are required to conform to the following Code of Conduct. organisers will enforce this code throughout the event.

The Short Version

EuroPython 2014 is dedicated to providing a harassment-free conference experience for everyone, regardless of gender, sexual orientation, disability, physical appearance, body size, race, or religion. We do not tolerate harassment of conference participants in any form.

All communication should be appropriate for a professional audience including people of many different backgrounds. Sexual language and imagery is not appropriate for any conference venue, including talks.

Be kind to others. Do not insult or put down other attendees. Behave professionally. Remember that harassment and sexist, racist, or exclusionary jokes are not appropriate for EuroPython 2014.

Attendees violating these rules may be asked to leave the conference without a refund at the sole discretion of the conference organisers.

Thank you for helping make this a welcoming, friendly event for all.

The Long Version

Harassment includes offensive verbal comments related to gender, sexual orientation, disability, physical appearance, body size, race, religion, sexual images in public spaces, deliberate intimidation, stalking, following, harassing photography or recording, sustained disruption of talks or other events, inappropriate physical contact, and unwelcome sexual attention.

Participants asked to stop any harassing behavior are expected to comply immediately.

Exhibitors in the expo hall, sponsor or vendor booths, or similar activities are also subject to the anti-harassment policy. In particular, exhibitors should not use sexualized images, activities, or other material. Booth staff (including volunteers) should not use sexualized clothing/uniforms/costumes, or otherwise create a sexualized environment.

Be careful in the words that you choose. Remember that sexist, racist, and other exclusionary jokes can be offensive to those around you. Excessive swearing and offensive jokes are not appropriate for EuroPython 2014.

If a participant engages in behavior that violates this code of conduct, the conference organisers may take any action they deem appropriate, including warning the offender or expulsion from the conference with no refund.

The full Code of Conduct text including contact information can be found here.

This text is based on the Code Of Conduct text by PyCon IE which is based on the original PSF Code of Conduct.

Looking for malicious traffic in electrical SCADA networks - part 1, (Tue, Apr 15th)

 ∗ SANS Internet Storm Center, InfoCON: green

When infosec guys are performing intrusion detection, they usually look for attacks like portscan ...(more)...

PyPy Development: NumPy on PyPy - Status Update

 ∗ Planet Python

Work on NumPy on PyPy continued in March, though at a lighter pace than the previous few months. Progress was made on both compatibility and speed fronts. Several behavioral issues reported to the bug tracker were resolved. The most significant of these was probably the correction of casting to built-in Python types. Previously, int/long conversions of numpy scalars such as inf/nan/1e100 would return bogus results. Now, they raise or return values, as appropriate.

On the speed front, enhancements to the PyPy JIT were made to support virtualizing the raw_store/raw_load memory operations used in numpy arrays. Further work remains here in virtualizing the alloc_raw_storage when possible. This will allow scalars to have storages but still be virtualized when possible in loops.

Aside from continued work on compatibility/speed of existing code, we also hope to begin implementing the C-level components of other numpy modules such as mtrand, nditer, linalg, and so on. Several approaches could be taken to get C-level code in these modules working, ranging from reimplementing in RPython to interfacing with existing code with CFFI, if possible. The appropriate approach depends on many factors and will probably vary from module to module.

To try out PyPy + NumPy, grab a nightly PyPy and install our NumPy fork. Feel free to report comments/issues to IRC, our mailing list, or bug tracker. Thanks to the contributors to the NumPy on PyPy proposal for supporting this work.

Python Diary: Many great talks and swag to be had

 ∗ Planet Python

PyCon 2014 was a great experience to be had. There were many fascinating talks and so much to see and do at the convention. I also performed a Lightning Talk on server security giving some tips and tricks I use on a daily basis, which is based on my Debian Diary article I wrote a couple months back.

Some of the larger swag items I nabbed was a book titled, Hacking: The art of exploitation, and another called Core Python Application Programming which was signed by the author. I am really excited about reading both of these books.

I went to a couple very interesting talks as well.

Jeff Knupp: How 'DevOps' is Killing the Developer

 ∗ Planet Python

There are two recent trends I really hate: DevOps and the notion of the "full-stack" developer. The DevOps movement is so popular that I may as well say I hate the x86 architecture or monolithic kernels. But it's true: I can't stand it. The underlying cause of my pain? This fact: not every company is a start-up, though it appears that every company must act as though they were.

DevOps

"DevOps" is meant to denote a close collaboration and cross-pollination between what were previously purely development roles, purely operations roles, and purely QA roles. Because software needs to be released at an ever-increasing rate, the old "waterfall" develop-test-release cycle is seen as broken. Developers must also take responsibility for the quality of the testing and release environments.

The increasing scope of responsibility of the "developer" (whether or not that term is even appropriate anymore is debatable) has given rise to a chimera-like job candidate: the "full-stack" developer. Such a developer is capable of doing the job of developer, QA team member, operations analyst, sysadmin, and DBA. Before you accuse me of hyperbole, go back and read that list again. Is there any role in the list whose duties you wouldn't expect a "full-stack" developer to be well versed in?

Where did these concepts come from? Start-ups, of course (and the Agile methodology). Start-ups are a peculiar beast and need to function in a very lean way to survive their first few years. I don't deny this. Unfortunately, we've taken the multiple technical roles that engineers at start-ups were forced to play due to lack of resources into a set of minimum qualifications for the role of "developer".

Many Hats

Imagine you're at a start-up with a development team of seven. You're one year into development of a web applications that X's all the Y's and things are going well, though it's always a frantic scramble to keep everything going. If there's a particularly nasty issue that seems to require deep database knowledge, you don't have the liberty of saying "that's not my specialty," and handing it off to a DBA team to investigate. Due to constrained resources, you're forced to take on the role of DBA and fix the issue yourself.

Now expand that scenario across all the roles listed earlier. At any one time, a developer at a start-up may be acting as a developer, QA tester, deployment/operations analyst, sysadmin, or DBA. That's just the nature of the business, and some people thrive in that type of environment. Somewhere along the way, however, we tricked ourselves into thinking that because, at any one time, a start-up developer had to take on different roles he or she should actually be all those things at once.

If such people even existed, "full-stack" developers still wouldn't be used as they should. Rather than temporarily taking on a single role for a short period of time, then transitioning into the next role, they are meant to be performing all the roles, all the time. And here's what really sucks: most good developers can almost pull this off.

The Totem Pole

Good developers are smart people. I know I'm going to get a ton of hate mail, but there is a hierarchy of usefulness of technology roles in an organization. Developer is at the top, followed by sysadmin and DBA. QA teams, "operations" people, release coordinators and the like are at the bottom of the totem pole. Why is it arranged like this?

Because each role can do the job of all roles below it if necessary.

Start-ups taught us this. Good developers can be passable DBAs if need be. They make decent testers, "deployment engineers", and whatever other ridiculous term you'd like to use. Their job requires them to know much of the domain of "lower" roles. There's one big problem with this, and hopefully by now you see it:

It doesn't work in the opposite direction.

A QA person can't just do the job of a developer in a pinch, nor can a build-engineer do the job of a DBA. They never acquired the specialized knowledge required to perform the role. And that's fine. Like it or not, there are hierarchies in every organization, and people have different skill sets and levels of ability. However, when you make developers take on other roles, you don't have anyone to take on the role of development!

An example will make this more clear. My dad is a dentist running his own practice. He employs a secretary, hygienist, and dental assistant. Under some sort of "DentOps" movement, my dad would be making appointments and cleaning people's teeth while trying to find time to drill cavities, perform root canals, etc. My dad can do all of the other jobs in his office, because he has all the specialized knowledge required to do so.

But no one, not even all of his employees combined, can do his job.

Such a movement does a disservice to everyone involved, except (of course) employers. What began as an experiment aimed at increasing software quality has become a farce, where the most talented employees are overworked (while doing less, less useful work) and lower-level positions simply don't exist.

And this is the crux of the issue. All of the positions previously held by people of various levels of ability are made redundant by the "full-stack" engineer. Large companies love this, as it means they can hire far fewer people to do the same amount of work. In the process, though, actual development becomes a vanishingly small part of a developer's job. This is why we see so many developers that can't pass FizzBuzz: they never really had to write any code. All too common a question now, can you imagine interviewing a chef and asking him what portion of the day he actually devotes to cooking?

Jack of All Trades, Master of None

If you are a developer of moderately sized software, you need a deployment system in place. Quick, what are the benefits and drawbacks of the following such systems: Puppet, Chef, Salt, Ansible, Vagrant, Docker. Now implement your deployment solution! Did you even realize which systems had no business being in that list?

We specialize for a reason: human beings are only capable of retaining so much knowledge. Task-switching is cognitively expensive. Forcing developers to take on additional roles traditionally performed by specialists means that they:

What's more, by forcing developers to take on "full-stack" responsibilities, they are paying their employees far more than the market average for most of those tasks. If a developer makes 100K a year, you can pay four developers 100K per year to do 50% development and 50% release management on a single, two-person task. Or, simply hire a release manager at, say, 75K and two developers who develop full-time. And notice the time wasted by developers who are part time release-managers but don't always have releases to manage.

Don't Kill the Developer

The effect of all of this is to destroy the role of "developer" and replace it with a sort of "technology utility-player". Every developer I know got into programming because they actually enjoyed doing it (at one point). You do a disservice to everyone involved when you force your brightest people to take on additional roles.

Not every company is a start-up. Start-ups don't make developers wear multiple hats by choice, they do so out of necessity. Your company likely has enough resource constraints without you inventing some. Please, don't confuse "being lean" with "running with the fewest possible employees". And for God's sake, let developers write code!

INFOCon Green: Heartbleed - on the mend, (Mon, Apr 14th)

 ∗ SANS Internet Storm Center, InfoCON: green

We are going back to INFOCon Green today.   Things have stabilized and the INFOCon is used t ...(more)...

VMWare Advisory VMSA-2014-0004 - Updates on OpenSSL HeartBleed http://www.vmware.com/security/advisories/VMSA-2014-0004.html, (Tue, Apr 15th)

 ∗ SANS Internet Storm Center, InfoCON: green

...(more)...

Andre Roberge: Reeborg knows multiple programming languages

 ∗ Planet Python

I wish I were in Montreal to visit my daughter, eat some delicious Saint-Sauveur bagels for breakfast, a good La Banquise poutine and some Montreal Smoked Meat for lunch... and, of course, attend Pycon.  Alas....

In the meantime, a quick update: Reeborg now knows Python, Javascript and CoffeeScript.  The old tutorials are gone as Reeborg's World has seen too many changes.  I now am in the process of writing the following tutorials, all using Reeborg's world as the test environment

  1. A quick introduction to Python (for people that know programming in another language)
  2. A quick introduction to Javascript (same as above)
  3. A quick introduction to CoffeeScript (same as above)
  4. An introduction to programming using Python, for absolute beginners
  5. An introduction to programming using Javascript, for absolute beginners
  6. An introduction to Object-Oriented Programming concepts using Python
  7. An introduction to Object-Oriented Programming concepts using Javascript
Note that I have two "versions" of Javascript, one that uses JSHint to enforce good programming practices (and runs the code with "use strict"; option) and one that is the normal permissive Javascript.

If anyone knows of any other transpilers written in Javascript that can convert code client-side from language X into Javascript (like Brython does for Python, or CoffeeScript does naturally), I would be interested in adding them as additional options.

ISC StormCast for Tuesday, April 15th 2014 http://isc.sans.edu/podcastdetail.html?id=3935, (Tue, Apr 15th)

 ∗ SANS Internet Storm Center, InfoCON: green

...(more)...

Plone.org: Plone Website Accounts Safe from Heartbleed

 ∗ Planet Plone

The plone.org website is safe from the Heartbleed bug and, as such, plone.org passwords have not been disclosed.

Mike Driscoll: Miss PyCon 2014? Watch the Replay!

 ∗ Planet Python

If you’re like me, you missed PyCon North America 2014 this year. It happened last weekend. While the main conference days are over, the code sprints are still running. Anyway, for those of you who missed PyCon, they have released a bunch of videos on pyvideo! Every year, they seem to get the videos out faster than the last. I think that’s pretty awesome myself. I’m looking forward to watching a few of these so I can see what I missed.

Ned Batchelder: PyCon 2014

 ∗ Planet Python

PyCon 2014 is over, and as usual, I loved every minute. There are a huge number of people that I know there, and about 5 different sub-communities that I feel an irrationally strong attachment to.

Some highlights:

My head is still spinning from the high-energy four days I've had, I'm sure I'm leaving out an important high point. I just love every minute!

On the downside, I did not see as much of Montreal as I would have liked, but we'll be back for PyCon 2015, so I have a second chance!

Future Foundries: Crochet 1.2.0, now with a better API!

 ∗ Planet Python

Crochet is a library for using Twisted more easily from blocking programs and libraries. The latest version, released here at PyCon 2014, includes a much improved API for calling into Twisted from threads. In particular, a timeout is passed in - if it is hit the underlying operation is cancelled, and an exception is raised. Not all APIs in Twisted support cancellation, but for those that do (or APIs you implement) this is a really nice feature. You get high level timeouts (instead of blocking sockets' timeout-per-socket-operation) and automatic cleanup of resources if something takes too long.

#!/usr/bin/python
"""
Do a DNS lookup using Twisted's APIs.
"""
from __future__ import print_function

# The Twisted code we'll be using:
from twisted.names import client

from crochet import setup, wait_for
setup()


# Crochet layer, wrapping Twisted's DNS library in a blocking call.
@wait_for(timeout=5.0)
def gethostbyname(name):
"""Lookup the IP of a given hostname.

Unlike socket.gethostbyname() which can take an arbitrary amount
of time to finish, this function will raise crochet.TimeoutError
if more than 5 seconds elapse without an answer being received.
"""
d = client.lookupAddress(name)
d.addCallback(lambda result: result[0][0].payload.dottedQuad())
return d


if __name__ == '__main__':
# Application code using the public API - notice it works in a normal
# blocking manner, with no event loop visible:
import sys
name = sys.argv[1]
ip = gethostbyname(name)
print(name, "->", ip)

Machinalis: Migrating data into your Django project

 ∗ Planet Python

There are times when we have an existing, legacy, DB and we need to migrate its data into our Django application. In this post I’ll share a technique that we successfully applied for this.

Working on a big project, our client had an existing application using a MySQL DB. Our objective was to develop a new, more modern, feature-rich, Django 1.5-based version of his tool. At a certain stage of the development our client requested that we migrate some of the current users’ data into the new system, so we could move to a beta-testing phase.

The method that we applied not only allowed us to effectively migrate dozens of users to the new system, but also we could keep doing migrations as the application continued its development.

General description

We based our work in two very powerful Django’s features:

  1. Multiple databases and
  2. Integrating Django with a legacy database

So, the general procedure would be:

  1. Add a new, legacy database to your project.

  2. Create a legacy app.
    • Automatically generate the models
    • Set up a DB router.
  3. Write your migration script.

Let’s describe each step a little bit more:

1. A legacy database

We assume here that you have access to the legacy DB. In our particular case, before each migration our client will give us a MySQL dump of the legacy DB. So we create a fresh legacydb in our own DB server and import the dump, every time.

However, it doesn’t matter how you access the legacy DB as long as you can do it from Django. So, following the Multiple databases approach, you must edit the project’s settings.py and add the legacy database. For example like this:

DATABASES = {
    'default': {
        'NAME': 'projectdb',
        'ENGINE': 'django.db.backends.mysql',
        'USER': 'some_user',
        'PASSWORD': '123'
        },
    'legacy': {
        'NAME': 'legacydb',
        'ENGINE': 'django.db.backends.mysql',
        'USER': 'other_user',
        'PASSWORD': '456'
    }
}

Depending on your objectives regarding the migration, this settings can be set either in your standard project’s settings.py file or in a different, special, settings file to be used only during extraordinary migrations.

2. A legacy app

The general idea here is that you start a new app that will represent your legacy data. All the work (other than the settings) will be done within this app. Thus, you can keep it in a different branch (maintain the migration feature isolated) and continue the development process normally.

inspectdb

Now, the key for this step is to follow the Integrating Django with a legacy database document. By using the admin’s inspectdb command the models.py file can be automatically generated!.

$ mkdir apps/legacy
$ python manage.py startapp legacy apps/legacy/
$ python manage.py inspectdb --database=legacy > apps/legacy/models.py

Anyways, as the documentation says:

This feature is meant as a shortcut, not as definitive model generation. After you run it, you’ll want to look over the generated models yourself to make customizations.

In our particular case, it worked like a charm and only cosmetic modifications were needed!

Database router

Next, a database router must be provided. It is Django’s mechanism to match objects with their original database.

Django’s default routing scheme ensures that if a database isn’t specified, all queries fall back to the default database. In our case, we will make sure that objects from the legacy app are taken from its corresponding DB (and make it read-only). An example router would be:

# Specific router to point all read-operations on legacy models to the
# 'legacy' DB.
# Forbid write-operations and syncdb.


class LegacyRouter(object):

    def db_for_read(self, model, **hints):
        """Point all operations on legacy models to the 'legacy' DB."""
        if model._meta.app_label == 'legacy':
            return 'legacy'
        return 'default'

    def db_for_write(self, model, **hints):
        """Our 'legacy' DB is read-only."""
        return False

    def allow_relation(self, obj1, obj2, **hints):
        """Forbid relations from/to Legacy to/from other app."""
        obj1_is_legacy = (obj1._meta.app_label == 'legacy')
        obj2_is_legacy = (obj2._meta.app_label == 'legacy')
        return obj1_is_legacy == obj2_is_legacy

    def allow_syncdb(self, db, model):
        return db != 'legacy' and model._meta.app_label != 'legacy'

Finally, to use the router you’ll need to add it to your settings.py file.

DATABASE_ROUTERS = ['apps.legacy.router.LegacyRouter']

Now you are ready to access your legacy data using Django’s ORM. Open the shell, import your legacy models and play around!

For a more detailed example of this technique applied, check this other blog post. It is based on Django 1.3 but still useful.

3. Your migration script

At this point you have access to the legacy data using Django’s ORM. Now it is time to write the actual migration script. There is no magic nor much automation here: you know your data model and (hopefully) the legacy DB structure. It is in your hands to create your system’s models instances and their relations.

In our case, we wrote an export.py script that we manually run from the command line whenever we need.

It’s a really good idea to perform the migration inside a single transaction. Otherwise, any error while running the migration script will let you with a partial (and possible inconsistent migration) and will force you to write complex logic to be able to resume it. The @transaction.commit_on_success decorator is a good way to achieve the desired effect. As a helpful side effect, it will also be faster to do a single commit.

Conclusions

As a general data-migration technique for Django applications, it has several advantages:

On the other side, as usual, it is no silver bullet. One of the main problems here is that the complexity of the task is directly proportional to the difference between the DB models. Since the actual data manipulation must be programmed manually, very different data models potentially means a lot of work.

So, as stated in the beginning of the post: the method allowed us to successfully migrate a considerable amount of data into our system, allowing us to accommodate to changes as the application continued its development.

Martijn Faassen: The Call of Python 2.8

 ∗ Planet Python

Introduction

Guido recently felt he needed to re-empathize that there will be no Python 2.8. The Python developers have been very clear for years that there will never be a Python 2.8.

http://legacy.python.org/dev/peps/pep-0404/

At the Python language summit there were calls for a Python 2.8. Guido reports:

We (I) still don't want to do a 2.8 release, and I don't want to accelerate 3.5, but I do think we should make things better for people who have to straddle Python 2 and 3 in a single codebase, by developing more tools, and by security and possibly installer updates to 2.7 (PEP 466).

At his keynote at PyCon, he said it again:

/guido_no.jpg

A very good thing happened to recognize the reality that Python 2.7 is still massively popular: the end of life date for Python 2.7 was changed by Guido to 2020 (it was 2015). In the same change he felt he should repeat there will be no Python 2.8:

+There will be no Python 2.8.

The call for Python 2.8 is strong. Even Guido feels it!

People talk about a Python 2.8, and are for it, or, like Guido, against it, but rarely talk about what it should be. So let's actually have that conversation.

Why talk about something that will never be? Because we can't call for something, nor reject something if we don't know what it is.

What is Python 2.8 for?

Python 2.8 could be different things. It could be a Python 2.x release that reduces some pain points and adds features for Python 2 developers independent from what's going on in Python 3. It makes sense, really: we haven't had a new Python 2 feature release since 2010 now. Those of us with existing large Python 2 codebases haven't benefited from the work the language developers have done in those years. Even polyglot libraries that support Python 2 and 3 both can't use the new features, so are also stuck with a 2010 Python. Before Python 2.7, the release cycle of Python has seen a new compatible release every 2 years or less. The reality of Python for many of its users is that there has been no feature update of the language for years now.

But I don't want to talk about that. I want to talk about Python 2.8 as an incremental upgrade path to Python 3. If we are going to add features to Python 2, let's take them from Python 3. I want to talk about bringing Python 2.x closer to Python 3. Python 2 might never quite reach Python 3 parity, but it could still help a lot if it can get closer incrementally.

Why an incremental upgrade?

In the discussion about Python 3 there is a lot of discussion about the need to port Python libraries to Python 3. This is indeed important if you want the ability to start new projects on Python 3. But many of us in the trenches are working on large Python 2 code bases. This isn't just maintenance. A large code base is alive, so we're building new features in Python 2.

Such a large Python codebase is:

You can argue that I'm overstating the risks of porting. But we need to face it: many codebases written in Python 2 have low automatic test coverage. We don't like to talk about it because we think everybody else is better at automated testing than we are, but it's the reality in the field.

We could say, fine, they can stay on Python 2 forever then! Well, at least until 2020. I think this would be unwise, as these organizations are paying a lot of developers money to work on Python code. This has an effect on the community as a whole. It contributes to the gravity of Python 2.

Those organizations, and thus the wider Python community, would be helped if there was an incremental way to upgrade their code bases to Python 3, with easy steps to follow. I think we can do much more to support such incremental upgrades than Python 2.7 offers right now.

Python 2.8 for polyglot developers

Besides helping Python 2 code bases go further step by step, Python 2.8 can also help those of us who are maintaining polyglot libraries, which work in both Python 2 and Python 3.

If a Python 2.8 backported Python 3 features, it means that polyglot authors can start using those features if they drop Python 2.7 support right there in their polyglot libraries, without giving up Python 2 compatibility. Python 2.8 would actually help encourage those on Python 2.7 codebases to move towards Python 3, so they can use the library upgrades.

Of course dropping Python 2.x support entirely for a polyglot library will also make that possible. But I think it'll be feasible to drop Python 2.7 support in favor of Python 2.8 much faster than it is possible to drop Python 2 support entirely.

But what do we want?

I've seen Python 3 developers say: but we've done all we could with Python 2.7 already! What do you want from a Python 2.8?

And that's a great question. It's gone unanswered for far too long. We should get a lot more concrete.

What follows are just ideas. I want to get them out there, so other people can start thinking about them. I don't intend to implement any of it myself; just blogging about it is already breaking my stress-reducing policy of not worrying about Python 3.

Anyway, I might have it all wrong. But at least I'm trying.

Breaking code

Here's a paradox: I think that in order to make an incremental upgrade possible for Python 2.x we should actually break existing Python 2.x code in Python 2.8! Some libraries will need minor adjustments to work in Python 2.8.

I want to do what the from __future__ pattern was introduced for in the first place: introduce a new incompatible feature in a release but making it optional, and then later making the incompatible feature the default.

The Future is Required

Python 2.7 lets you do from __future__ import something to get the interpreter behave a bit more like Python 3. In Python 2.8, those should be the default behavior.

In order to encourage this and make it really obvious, we may want to consider requiring these in Python 2.8. That means that the interpreter raises an error unless it has such a from __future__ import there.

If we go for that, it means you have to have this on the top of all your Python modules in Python 2.8:

absolute_import appears to be uncontroversial, but I've seen people complain about both division and print_function. If people reject Python 3 for those reasons, I want to make clear I'm not in the same camp. I believe that is confusing at most a minor inconvenience with a dealbreaker. I think discussion about these is pretty pointless, and I'm not going to engage in it.

I've left out unicode_literals. This is because I've seen both Nick Coghlan and Armin Ronacher argue against them. I have a different proposal. More below.

What do we gain by this measure? It's ugly! Yes, but we've made the upgrade path a lot more obvious. If an organisation wants to upgrade to Python 2.8, they have to review their imports and divisions and change their print statements to function calls. That should be doable enough, even in large code bases, and is an upgrade path a developer can do incrementally, maybe even without having to convince their bosses first. Compare that to an upgrade to Python 3.

from __future3__ import new_classes

We can't do everything with the old future imports. We want to allow more incremental upgrading. So let's introduce a new future import.

New-style classes, that is classes that derive from object, were introduced in Python 2 many years ago, but old-style classes are still supported. Python 3 only has new-style classes. Python 2.8 can help here by making new style classes the default. If you import from __future3__ import new_classes at the top of your module, any class definition in that module that looks like this:

class Foo:
   pass

is interpreted as a new-style class.

This might break the contract of the module, as people may subclass from this class and expect an old-style class, and in some (rare) cases this can break code. But at least those problems can be dealt with incrementally. And the upgrade path is really obvious.

__future3__?

Why did I write __future3__ and not __future__? Because otherwise we can't write polyglot code that is compatible in Python 2 and Python 3.

Python 3.4 doesn't support from __future__ import new_classes. We don't want to wait for a Python 3.5 or Python 3.6 to support this, even there is even any interest in supporting this among the Python language developers at all. Because after all, there won't be a Python 2.8.

That problem doesn't exist for __future3__. We can easily fake a __python3__ module in Python 3 without being dependent on the language developers. So polyglot code can safely use this.

from __future3__ import explicit_literals

Back to the magic moment of Nick Coghlan and Armin Ronacher agreeing.

Let's have a from __future3__ import explicit_literals.

This forces the author to be entirely explicit with string literals in the module that imports it. "foo" and 'foo' are now errors; the module won't import. Instead the module has to be explicit and use b'foo' and u'foo' everywhere.

What does that get us? It forces a developer to think about string literals everywhere, and that helps the codebase become incrementally more compatible with Python 3.

from __future3__ import str

This import line does two things:

I took this idea from the Python future module, which makes Python 3 style str and bytes (and much more) available in Python 2.7. I've modified the idea as I have the imaginary power to change the interpreter in Python 2.8. Of course anything I got wrong is my own fault, not the fault of Ed Schofield, the author of the future module.

from __past__ import bytes

To ensure you still have access to Python 2 bytes (really str) just in case you still need it, we need an additional import:

from __past__ import bytes as oldbytes

oldbytes` can be called with Python 2 str, Python 2 bytes and Python 3 bytes. It rejects a Python 3 str. I'll talk about why it can be needed in a bit.

Yes, __past__ is another new namespace we can safely support in Python 3. It would get more involved in Python 3: it contains a forward port of the Python 2 bytes object. Python 3 bytes have less features than Python 2 bytes, and this has been a pain point for some developers who need to work with bytes a lot. Having a more capable bytes object in Python 3 would not hurt existing Python 3 code, as combining it with a Python 3 string would still result in an error. It's just an alternative implementation of bytes with more methods on it.

from __future3__ import bytes

This is the equivalent import for getting the Python 3 bytes object.

Combining Python 3 str/bytes with Python 2 unicode/str

So what happens when we somehow combine a Python 3 str/bytes with a Python 2 str/bytes/unicode? Let's think about it.

The future module by Ed Schofield forbids py3bytes + py2unicode, but supports other combinations and upcasts them to their Python 3 version. So, for instance, py3str + py2unicode -> py3str. This is a consequence of the way it tries to make Python 2 string literals work a bit like they're Python 3 unicode literals. There is a big drawback to this approach; a Python 3 bytes is not fully compatible with APIs that expect a Python 2 str, and a library that tried to use this approach would suffer API breakage. See this issue for more information on that.

I think since we have the magical power to change the interpreter, we can do better. We can make real Python 3 string literals exist in Python 2 using __future3__.

I think we need these rules:

So while we upcast existing Python 2 unicode strings to Python 3 str we refuse any other combination.

Why not let people combine Python 2 str/bytes with Python 3 bytes? Because the Python 3 bytes object is not compatible with the Python 2 bytes object, and we should refuse to guess and immediately bail out when someone tries to mix the two. We require an explicit Python 2 str call to convert a Python 3 bytes to a str.

This is assuming that the Python 3 str is compatible with Python 2 unicode. I think we should aim for making a Python 3 string behave like a subclass of a Python 2 unicode.

What have we gained?

We can now start using Python 3 str and Python 3 bytes in our Python 2 codebases, incrementally upgrading, module by module.

Libraries could upgrade their internals to use Python 3 str and bytes entirely, and start using Python 3 str objects in any public API that returns Python 2 unicode strings now. If you're wrong and the users of your API actually do expect str-as-bytes instead of unicode strings, you can go deal with these issues one by one, in an incremental fashion.

For compatibility you can't return Python 3 bytes where Python 2 str-as-bytes is used, so judicious use of __past__.str would be needed at the boundaries in these cases.

After Python 2.8

People who have ported their code to Python 2.8 and have turned on all the __future3__ imports incrementally will be in a better place to port their code to Python 3. But to offer a more incremental step, we can have a Python 2.9 that requires the __future3__ imports introduced by Python 2.8. And by then we might have thought of some other ways to smoothen the upgrade path.

Summary

Airplane Message

 ∗ xkcd.com

PHARAOH IRY-HOR, FROM THE 3100s BC, IS THE FIRST HUMAN WHOSE NAME WE KNOW.

ISC StormCast for Monday, April 14th 2014 http://isc.sans.edu/podcastdetail.html?id=3933, (Sun, Apr 13th)

 ∗ SANS Internet Storm Center, InfoCON: green

...(more)...

Sylvain Hellegouarch: Having fun with WebSocket and Canvas

 ∗ Planet Python

Recently, I was advised that WebFaction had added support for WebSocket in their custom applications by enabling the according nginx module in their frontend. Obviously, I had to try it out with my own websocket library: ws4py.

Setting up your WebFaction application

Well, as usual with WebFaction, setting up the application is dead simple. Only a few clicks from their control panel.

Create a Custom application and select websocket. This will provide you with a port that your backend will be bound to. And voilà.

Now, your application is created but you won’t yet be able to connect a websocket client. Indeed, you must associate a domain or subdomain with that application.

It is likely your application will be used from a javascript connector in living in a browser, which means, you will be bound by the browser same-origin security model. I would therefore advise you to carefully consider your sub-domain and URL strategies. Probably something along:

This is just a suggestion of course but this will make it easier for your deployment to follow a simple strategy like this one.

In the WebFaction control panel, create a website which associates your web application with the domain (your webapp can be anything you need to). Associate then your custom websocket application with the same domain but a different path segment. Again, by sharing the same domain, you’ll avoid troubles regarding working around the same-origin security model. I would probably advise as well that you enable SSL but it’s up to you to make that decision.

Once done, you will have now a configured endpoint for your websocket application.

The Custome websocket application will forward all requests to you so that you can run your web and websocket apps from that single port. This is what the demo below allows itself doing. I would recommend that you run two different application processes, one for your webapp and another one for your websocket endpoint. Be antifragile.

Drawing stuff collaboratively

I setup a small demo (sorry self-signed certificate) to demonstrate how you can use HTML5 canvas and websocket features to perform collaborative tasks across various connected clients.

That demo runs a small webapp that also enables a websocket endpoint. When you are on the drawing board, everything you draw is sent to other connected clients so that their view reflects what’s happening on yours. Obviously this goes in any way frm any client to any other clients.

The demo is implemented using ws4py hosted within a CherryPy application. Drawing events are serialized into a json structure and sent over to the server which dispatches them to all participants of that board, and only that board (note, this demo doesn’t validate the content before dispatching back so please conservative with whom you share your board with).

Open the link found in the demo page and share it on as many browsers as you can (including your mobile device). Starting drawing from one device will make all other devices been drawn onto simultaneously and synchronously. Note that the board stays available for up to 5mn only and will disconnect all participants then.

The source code for the demo is located here.

Some feedback…

Let me share a bit of feedback about the whole process.

Next, I wouldn’t mind adding websocket to that fun demo from the mozilla developer network.

A. Jesse Jiryu Davis: PyCon 2014 Video: What Is Async, How Does It Work, & When Should I Use It?

 ∗ Planet Python

Here's my talk on Python's asyncio library, and async frameworks in general, from Montréal on April 11.

Direct link here.

Reverse Heartbleed Testing, (Sun, Apr 13th)

 ∗ SANS Internet Storm Center, InfoCON: green

I wanted to know if the tools/software I execute regularly are vulnerable to scraping my syste ...(more)...

Davy Mitchell: pyvideo.org - Introduction to game programming

 ∗ Planet Python

Great way to get ready for the next PyWeek!



pyvideo.org - Introduction to game programming

Jazkarta Blog: Plone 5 and the 2014 Emerald Sprint

 ∗ Planet Plone

2014 Emerald Sprint Attendees

Back row l to r: Alec Mitchell, Spanky Kapanka, Eric Steele, Ian Anderson, Ross Patterson, Luke Bannon, Cris Ewing, Andy Leeb, Cal Doval, Chris Calloway. Front row l to r: Elizabeth Leddy, David Glick, Steve McMahon, Fulvio Casali, Franco Pellegrini, Sally Kleinfeldt, Trish Ang. Photo by Trish Ang.

I recently returned from the Emerald Sprint, and I have to say that Plone 5 is starting to look pretty good. For developers, there is a solid core buildout that even I was able to run without a hitch. So if there’s a PLIP (Plone Improvement Proposal) or a feature that interests you, and you’ve been thinking about contributing – do it! The community awaits you with open arms.  And what a great community! You don’t need to be a Python developer – Plone 5 is a Javascript-friendly development environment. We would love to have more Javascript developers and designers join our ranks. You won’t be sorry.

OOTB Plone 5 with the new editing toolbar on the left.  Still a work in progress, you will be able to choose top or side placement, and icons, text, or both.

OOTB Plone 5 with the new editing toolbar on the left. Still a work in progress, you will be able to choose top or side placement, and icons, text, or both.

But UI improvements and new features are the real cause of excitement. The first thing Plonistas will notice is the new theme – people new to Plone won’t find it remarkable, just clean and modern, but we Plone folks have been looking forward to replacing Sunburst for a long time. In fact we’ve been looking forward to Plone 5 for a long time. After the community gained consensus about what Plone 5 will be, things got a bit bogged down. The Javascript rewrite was extensive. The new content type framework (Dexterity) had to gain maturity as a Plone 4 add-on. There was – and still is – much discussion over how to improve and streamline content editing and page layouts; ideas are being implemented as add-on products such as collective.cover.

Over the last few months the community has really picked up the pace on Plone 5.  Supported by the Plone Foundation and numerous sponsors, there have been a series of productive sprints. The Emerald Sprint’s focus was on user management, registration, and login.  A robust system of user permissions, groups, and roles is one of Plone’s most notable (and oldest) features and the concepts and underpinnings are still solid. However the UI is overdue for an overhaul and the old implementation layers have gotten pretty crufty.

Cris Ewing shows a mockup of the new registration process

Cris Ewing shows a mockup of the new registration process

The sprinters, led by David Glick, took a UI-first aproach which was fantastic. Before cracking open the laptops and diving into the code, we developed mockups of the new registration and login process and user management screens. It really helped that 2 of the 17 sprinters were UI/UX designers. Always try to get designers to come to your sprints!

The sprint gathered together a fantastic set of Plone gurus who were able to have in depth discussions of some of the thornier technical problems associated with users. For example, should user objects work like content objects? This discussion resulted in a concrete list of pros and cons and a better understanding of how to ameliorate potential problems if and when we decide to move to “contentish” users.

And of course software got developed. Teams worked on the Users and Groups control panel, member profile design and implementation, registration, log in, and forgotten password dialogs, and more. Read the summary report on plone.org for more details.

Plone 5 login design

Plone 5 log in design.


keul: "General Type": a new criterion for Plone Collections

 ∗ Planet Plone

A new 1.2 version of plone.app.querystring has been released.
There are some improvement and bugfix but I'm particularly interested in one new feature: customizable parsed query. Why?

Some times ago I started developing a new product for providing some usability enhancement in type categorization using Collection but it was a dead end: it wont work without patching Plone. But the accepted pull request changed everything, so here it is: collective.typecriterion.

The product want to replace the Collection's "Types" search term (adding a new search term called "General type").

The scope of the add-on is to fix some usability issues with Collections:

Plone type collection criteriaAlso there are some missing features:
Some of the point above could be reached searching types using interfaces (through object_provides index) instead of portal_type (the attribute that commonly store primitive type name of every content, but:
The idea is to keep using portal_type but give administrators a way to group and organize them in a more friendly form.

After installation the new control panel entry "Type criterion settings" will be available.
Plone general type control panelThe target of the configuration panel is simple: is possible to group a set of types under the cloak of a new descriptive type name. In the example given in the image we take again definition of a "textual" content (a content that contains rich text data), grouping all the know types.

After the configuration you can start using the new search term.
Plone type collection criteria Usability apart there's also another advantage in this approach, that is the integration with 3rd party products.

Let say that you defined a new general type called "Multimedia" and you configure it as a set that contains Image and Video, and let say that Video went from the installation of redturtle.video product.
After a while you plan a switch from redturtle.video to wildcard.media. What you need to do is simply to change the configuration of the general type, not all the collections in the site.

Finally an interesting note: the code inside the collective.typecriterion is really small. All the magic (one time again) came from Plone.

Jeff Knupp: Great Products Seem Obvious in Retrospect

 ∗ Planet Python

Note, this page originally appeared on the sandman.io blog.

sandman automatically generates a REST API service and web admin from your existing database without requiring any code!

When you look at the most disruptive technology products of the last few years (or months, decades, etc), you may notice that the products themselves seem "obvious". It's almost impossible to believe that there was a time when a service like didn't exist. Or when to find out what friends and family were doing we had to call them and ask. Or when a centralized place to share videos didn't exist on the Internet.

Dropbox, Facebook, and YouTube all share the same quality: in retrospect, they seem obvious. In fact, some would say that they didn't actually do all that much. Personal profile sites already existed. Wasn't it just a matter of time before someone made them pretty and easy to use? And posting videos on the Internet was never difficult, so it's inevitable that someone would eventually create a centralized place for it.

In a way, it's true. These services took an existing (or "near-existing") technology and productized it. The key, though, is that Dropbox, Facebook, and YouTube fulfilled desires we didn't even know we had. Each of these web giants evoke a "that's it"-style shoulder shrug today, but they noticed opportunities where no-one else did. They grew big by seeing need where it didn't yet exist.

Enter Sandman

Sandman (on GitHub here) often evokes similar reactions when I describe it to people. "That's it?" they wonder aloud. "Doesn't something already exist to do something like that? Surely someone must have already done this. It seems so obvious!". Sandman, which builds a web admin and REST API service on top of your legacy database without requiring any code, seems like such an obvious product that most people assume it already exists. In fact, many people say that they had the same idea, but never followed through.

To be sure, Sandman is no technological marvel. It takes two technologies which are well established, ORMs (Object-Relation Mappers) and code generation, and marries them in a simple, straightforward manner. The result, however, is nothing short of magic.

Your Database, In Your Browser

I love the look on people's face when they first run Sandman. They enter the details of their existing database, hit enter, and bam!, Sandman has opened a browser tab pointed at their new admin site. There in front of them is all of their data, waiting to be manipulated.

For technical managers, other groups within the organization, and even external clients, the ability to add, edit, and delete information buried deep within an enterprise database is unparalleled. Forget about clunky GUI tools that connect to a single database and make you use SQL to add data. Just use your browser to fill a simple form, where most data is already auto-filled for you, to make the change to your data.

"It's stored in a database," is a phrase that probably evokes a shudder from most technology managers and programmers. With Sandman, hearing that something is "stored in a database" is the same as hearing "you access that through a beautiful, easy to use web tool that has been tested by hundreds of people". Sandman really does "free" your data.

Sandman Makes Things REST

When I'm showing Sandman to a developer, I ask them to curl a simple URL after they've connected Sandman. Without fail, their eyes light up when they realize the clunky, over-complicated legacy database (the kind that exists in every enterprise) now has a super-clean REST API.

"Imagine how easily we can run custom reports," they say. "Better yet, we can have Sandman generate them on-the-fly and simply give our users the URL of the results!" Interacting with legacy databases in the enterprise will never be the same.

Rather than having to find and install drivers and write different code for each type of database they connect to, developers can simply program against a single, RESTful service using battle-tested open source libraries. The amount of code that Sandman makes redundant is shocking. Sandman changes the way that developers create services, for the better.

Surely This Already Exists!

By now, some readers are thinking, "Surely this technology already exists! It's so obvious!" It does now. Sandman represents the effort required to marry ORMs with code generation to automatically, with no coding required create a REST API and web-based admin interface. Before Sandman, this "obvious" service really didn't exist. After Sandman, nothing will quite be the same.

Europython: Preliminary schedule available

 ∗ Planet Python

image

We are pleased to present a preliminary list of finally accepted talks. There are currently further talk proposals in the decision queue including trainings. We will announce a final list of talks and trainings shortly. Thank you for your patience.

Interested in a Heartbleed Challenge?, (Sat, Apr 12th)

 ∗ SANS Internet Storm Center, InfoCON: green

CloudFlare lunched a challenge yesterday: Can You Get Private SSL Keys Using Heartbleed?[

Starzel.de: Mastering Plone

 ∗ Planet Plone

tl;dr: We're giving our three-day "Mastering Plone" training in Munich (in english)

During the course of this three-day training we will teach how to

In the first part we'll teach the fundamentals needed to setup and manage your own website using the many build-in features of Plone.

The second part will focus on customizations that may be done through-the-web.

The third and longest part will be devoted to Plone-development. Here we focus on current best-practices like Dexterity and Diazo that make developing with Plone fun and accessible.

The training is for people who are new to Plone or used to work with older versions of Plone and want to learn the current best-practices. Basic Python and HTML/CSS skills are a requirement.

The course is an updated and expanded version of the trainings we held at the international Plone-Conferences in Arnhem and Brasilia. The documentation for these can be found at http://starzel.github.io/training

As always, we give the training as a team of two Trainers. This way you'll receive a 1on1 training as soon as something works differently than expected. Something that is not possible with a single trainer and something that adds a lot of insight when something did not work as expected.

If you're interested call us at +49 (0)89 - 189 29 533 or send a mail to

Date:
26. - 28. May 2014

Time:
9:00 - 18:00

Location:
EineWeltHaus
Schwanthalerstr. 80
80336 München

Trainers:
Philip Bauer
Patrick Gerken

Language:
English

Cost:
EUR 1000.- per person plus 19% MwSt (VAT)

Photo: https://www.flickr.com/photos/mindonfire/4447448937

Steve Holden: Intermediate Python: An Open Source Documentation Project

 ∗ Planet Python

There is a huge demand for Python training materials, and there are many people who just don't have the spare cash to buy books or videos. That's one reason why, in conjunction with a new Intermediate Python video series I have just recorded for O'Reilly Media I am launching a new, open source, documentation project.

My intention in recording the videos was to produce a broad set of materials (the linked O'Reilly page contains a good summary of the contents). I figure that most Python programmers will receive excellent value for money even if they already know 75% of the content. Those to whom more is new will see a correspondingly greater benefit. But I was also keenly aware that many Python learners, particularly those in less-developed economies, would find the price of the videos prohibitive.

With O'Reilly's contractual approval the code that I used in the video modules, in IPython Notebooks, is going up on Github under a Creative Commons license. Some of it already contains markdown annotations among the code, other notebooks have little or no commentary. My intention is that ultimately the content will become more comprehensive than the videos, since I am using the video scripts as a starting point.

I hope that both learner programmers and experienced hands will help me turn it into a resource that groups and individuals all over the world can use to learn more about Python with no fees required. The current repository has to be brought up to date after a rapid spate of editing during the three-day recording session. It should go without saying that viewer input will be very welcome, since the most valuable opinions and information comes from those who have actually tried to use the videos to help them learn.

I hope this will also be a project that sees contributions from documentation professionals (and beginners they can help train), so I will be asking the WriteTheDocs NA team how we can lure some of those bright minds in.

Sadly it's unlikely I will be able to see their talented array of speakers as I will still be recovering from surgery. But a small party one evening or a brunch at the office might be possible. Knowing them it will likely involve sponsorship or beer. Or both. We shall see.

I think it's a worthwhile goal to have free intermediate-level Python sample code available, and I can't think of a better way for a relative beginner to get into an open source project. I also like the idea that two communities can come together over it and learn from each other. Suffice it to say, if there are enough people with a hundred bucks* in their pocket for a six-hour video training I am happy to use part of my share in the profits to support this project to some degree.

[DISCLOSURE: The author will receive a proportion of any profit from the O'Reilly Intermediate Python video series]

* This figure was plucked from the air before publication, and is still a good guideline, though as PyCon opened (Apr 11) a special deal was available on a package of both Jessica McKellar's Introduction to Python and my Intermediate Python.

Yasoob Khalid: The Heartbleed bug

 ∗ Planet Python

Hi guys! I haven’t been posting a lot recently. There are a couple of problems which have joined up and have kept me away from my computer. I will cover those reasons in the next post. So what this post is about?

Are you a sys-admin or a web master? If you are one then the chances are that you have already heard of the heartbleed bug. But for those who are unaware of this, let me explain. On 7th April a bug was spotted in OpenSSL (Yes that is the same encryption used by companies like Google, Facebook, Yahoo! etc on their websites). This bug allowed any hacker to send some carefully crafted packets to a server using OpenSSL and the server responded with more data than it should. It is a very serious vulnerability.

The Heartbleed bug allows anyone on the Internet to read the memory of the systems protected by the vulnerable versions of the OpenSSL software. This compromises the secret keys used to identify the service providers and to encrypt the traffic, the names and passwords of the users and the actual content. This allows attackers to eavesdrop on communications, steal data directly from the services and users and to impersonate services and users.

So what does this post has to do with the bug? Well I am going to share two Python scripts with you which will help you test whether a website is vulnerable to this bug or not.

The first script is heartbleed mass test which checks Alexa top sites for this bug so that you know on which websites you have to update your password. The second one is this scanner made by Jared Stafford which I think was one of the first scanner. I could not find the original Gist so I created this new one with the same code. Lastly I would also like to mention this online scanner written by one of my friend Filippo Valsorda. This scanner has the minimum false positives and is written in Go. The source code of this scanner is also available on GitHub.

There is also an unofficial website with a lot of information regarding this bug and how to fix it. If you have this vulnerability in your website then I urge you to fix it as soon as possible so that sensitive information about your viewers is not leaked. If you are using wrappers written in other languages then I urge you to update them as well as most of them have been patched by now.

If you use a website which is affected by this bug then do not update your password before this bug has been fixed! If you update you password before the bug is patched on that website then there are chances that your information can be leaked due to this bug.

Do share your views about this bug in the comments below and follow my blog to get more updates. Stay tuned for the next post.


Style Guides On Parade

 ∗ A List Apart: The Full Feed

» Style Guides On Parade

If you loved this week’s “Creating Style Guides” piece by Susan Robertson, you’ll thrill to Susan’s follow-up posting, on her personal site, of style guide links galore!

Critical Security Update for JetPack WordPress Plugin. Bug has existed since Jetpack 1.9, released in October 2012. - http://jetpack.me/2014/04/10/jetpack-security-update/, (Sat, Apr 12th)

 ∗ SANS Internet Storm Center, InfoCON: green

----------- Guy Bruneau

Peter Bengtsson: COPYFILE_DISABLE and python distutils in python 2.6

 ∗ Planet Python

My friend and colleague Jannis (aka jezdez) Leidel saved my bacon today where I had gotten completely stuck.

So, I have this python2.6 virtualenv and whenever I ran python setup.py sdist upload it would upload a really nasty tarball to PyPI. What would happen is that when people do pip install premailer it would file horribly and look something like this:

...
IOError: [Errno 2] No such file or directory: '/path/to/virtual-env/build/premailer/setup.py'

What?!?! If you download the tarball and unpack it you'll see that there definitely is a setup.py file in there.

Anyway. What happens, which I didn't realize was that within the .tar.gz file there were these strange copies of files. For example for every file.py there was a ._file.py etc.

Here's what the file looked like after a tarball had been created:

(premailer26)peterbe@mpb:~/dev/PYTHON/premailer (master)$ tar -zvtf dist/premailer-2.0.2.tar.gz
-rwxr-xr-x  0 peterbe staff     311 Apr 11 15:51 ./._premailer-2.0.2
drwxr-xr-x  0 peterbe staff       0 Apr 11 15:51 premailer-2.0.2/
-rw-r--r--  0 peterbe staff     280 Mar 28 10:13 premailer-2.0.2/._LICENSE
-rw-r--r--  0 peterbe staff    1517 Mar 28 10:13 premailer-2.0.2/LICENSE
-rw-r--r--  0 peterbe staff     280 Apr  9 21:10 premailer-2.0.2/._MANIFEST.in
-rw-r--r--  0 peterbe staff      34 Apr  9 21:10 premailer-2.0.2/MANIFEST.in
-rw-r--r--  0 peterbe staff     280 Apr 11 15:51 premailer-2.0.2/._PKG-INFO
-rw-r--r--  0 peterbe staff    7226 Apr 11 15:51 premailer-2.0.2/PKG-INFO
-rwxr-xr-x  0 peterbe staff     311 Apr 11 15:51 premailer-2.0.2/._premailer
drwxr-xr-x  0 peterbe staff       0 Apr 11 15:51 premailer-2.0.2/premailer/
-rwxr-xr-x  0 peterbe staff     311 Apr 11 15:51 premailer-2.0.2/._premailer.egg-info
drwxr-xr-x  0 peterbe staff       0 Apr 11 15:51 premailer-2.0.2/premailer.egg-info/
-rw-r--r--  0 peterbe staff     280 Mar 28 10:13 premailer-2.0.2/._README.md
-rw-r--r--  0 peterbe staff    5185 Mar 28 10:13 premailer-2.0.2/README.md
-rw-r--r--  0 peterbe staff     280 Apr 11 15:51 premailer-2.0.2/._setup.cfg
-rw-r--r--  0 peterbe staff      59 Apr 11 15:51 premailer-2.0.2/setup.cfg
-rw-r--r--  0 peterbe staff     280 Apr  9 21:09 premailer-2.0.2/._setup.py
-rw-r--r--  0 peterbe staff    2079 Apr  9 21:09 premailer-2.0.2/setup.py
-rw-r--r--  0 peterbe staff     280 Apr 11 15:51 premailer-2.0.2/premailer.egg-info/._dependency_links.txt
-rw-r--r--  0 peterbe staff       1 Apr 11 15:51 premailer-2.0.2/premailer.egg-info/dependency_links.txt
-rw-r--r--  0 peterbe staff     280 Apr  9 21:04 premailer-2.0.2/premailer.egg-info/._not-zip-safe
-rw-r--r--  0 peterbe staff       1 Apr  9 21:04 premailer-2.0.2/premailer.egg-info/not-zip-safe
-rw-r--r--  0 peterbe staff     280 Apr 11 15:51 premailer-2.0.2/premailer.egg-info/._PKG-INFO
-rw-r--r--  0 peterbe staff    7226 Apr 11 15:51 premailer-2.0.2/premailer.egg-info/PKG-INFO
-rw-r--r--  0 peterbe staff     280 Apr 11 15:51 premailer-2.0.2/premailer.egg-info/._requires.txt
-rw-r--r--  0 peterbe staff      23 Apr 11 15:51 premailer-2.0.2/premailer.egg-info/requires.txt
-rw-r--r--  0 peterbe staff     280 Apr 11 15:51 premailer-2.0.2/premailer.egg-info/._SOURCES.txt
-rw-r--r--  0 peterbe staff     329 Apr 11 15:51 premailer-2.0.2/premailer.egg-info/SOURCES.txt
-rw-r--r--  0 peterbe staff     280 Apr 11 15:51 premailer-2.0.2/premailer.egg-info/._top_level.txt
-rw-r--r--  0 peterbe staff      10 Apr 11 15:51 premailer-2.0.2/premailer.egg-info/top_level.txt
-rw-r--r--  0 peterbe staff     280 Apr  9 21:21 premailer-2.0.2/premailer/.___init__.py
-rw-r--r--  0 peterbe staff      66 Apr  9 21:21 premailer-2.0.2/premailer/__init__.py
-rw-r--r--  0 peterbe staff     280 Apr  9 09:23 premailer-2.0.2/premailer/.___main__.py
-rw-r--r--  0 peterbe staff    3315 Apr  9 09:23 premailer-2.0.2/premailer/__main__.py
-rw-r--r--  0 peterbe staff     280 Apr  8 16:22 premailer-2.0.2/premailer/._premailer.py
-rw-r--r--  0 peterbe staff   15368 Apr  8 16:22 premailer-2.0.2/premailer/premailer.py
-rw-r--r--  0 peterbe staff     280 Apr  8 16:22 premailer-2.0.2/premailer/._test_premailer.py
-rw-r--r--  0 peterbe staff   37184 Apr  8 16:22 premailer-2.0.2/premailer/test_premailer.py

Strangly, this only happened in a Python 2.6 environment. The problem went away when I created a brand new Python 2.7 enviroment with the latest setuptools.

So basically, the fault lies with OSX and a strange interaction between OSX and tar.
This superuser.com answer does a much better job explaining this "flaw".

So, the solution to the problem is to create the distribution like this instead:

$ COPYFILE_DISABLE=true python setup.py sdist

If you do that, you get a healthy lookin tarball that actually works to pip install. Thanks jezdez for pointing that out!

My Sketchbook Color Coding

 ∗ Wireframes Magazine


What, it’s been two years already? That’s how long it took me to fill in my dotted Leuchtturm notebook (German engineering at its finest) front to back. Since I’m starting a new one, I thought to devise a bit of a color coding system for my upcoming notes and just share it here. The colors I typically use to underline the very first page title. Here are the colors:

Light Grey For Thoughts & Inspirations

Sometimes I’ll hear or read something of interest from a podcast, article, or book and it gets coded this way. My own free-form personal random thoughts across various disciplines get placed here as well.

Medium Grey For Project Ideas

For the more solid, practical and actionable thoughts or sketches. These are both new project ideas or adjustments to existing ones and are often accompanies by sketched out screens.

Black For Business Strategies

These are the most strict, closest and firm action points which are tied to my business initiatives. They are very high level for the most part and act as strategic todo’s of sorts.

Red For Contacts

If I meet someone or a company of interest, they will get placed here. This section might also be some residue from a conversation with someone over a cup of coffee.

Blue For UI Patterns

Here come the user interface patterns – both good and dark. Be it existing patterns seen somewhere or envisioned ones, they both land here.

Yellow For Content

I write and rely heavily for content marketing for much of my business. Specific content ideas for existing projects get placed here. Oh, and in the example listed you can actually see that I’ve started scribbling down some content points for the GoodUI Datastories promotions. :)

Was this helpful? How do you structure your notebooks or sketchbooks?

Credits: Jakub Linowski

Stefan Scherfke: SimPy: Process Interaction

 ∗ Planet Python

As promised, the delay between this and the last topical guide was rather short.

This one is about process interaction. This is what makes event discrete simulation interesting. Without it, you actually wouldn’t even need SimPy.

So this guide is about:

Another possibility for processes to interact are resources. They will be discussed in the next guide.

Sleep until woken up

Imagine you want to model an electric vehicle with an intelligent battery-charging controller. While the vehicle is driving, the controller can be passive but needs to be reactivate once the vehicle is connected to the power grid in order to charge the battery.

In SimPy 2, this pattern was known as passivate / reactivate. In SimPy 3, you can accomplish that with a simple, shared Event:

>>> from random import seed, randint
>>> seed(23)
>>>
>>> import simpy
>>>
>>> class EV:
...     def __init__(self, env):
...         self.env = env
...         self.drive_proc = env.process(self.drive(env))
...         self.bat_ctrl_proc = env.process(self.bat_ctrl(env))
...         self.bat_ctrl_reactivate = env.event()
...
...     def drive(self, env):
...         while True:
...             # Drive for 20-40 min
...             yield env.timeout(randint(20, 40))
...
...             # Park for 1–6 hours
...             print('Start parking at', env.now)
...             self.bat_ctrl_reactivate.succeed()  # "reactivate"
...             self.bat_ctrl_reactivate = env.event()
...             yield env.timeout(randint(60, 360))
...             print('Stop parking at', env.now)
...
...     def bat_ctrl(self, env):
...         while True:
...             print('Bat. ctrl. passivating at', env.now)
...             yield self.bat_ctrl_reactivate  # "passivate"
...             print('Bat. ctrl. reactivated at', env.now)
...
...             # Intelligent charging behavior here …
...             yield env.timeout(randint(30, 90))
...
>>> env = simpy.Environment()
>>> ev = EV(env)
>>> env.run(until=150)
Bat. ctrl. passivating at 0
Start parking at 29
Bat. ctrl. reactivated at 29
Bat. ctrl. passivating at 60
Stop parking at 131

Since bat_ctrl() just waits for a normal event, we no longer call this pattern passivate / reactivate in SimPy 3.

Waiting for another process to terminate

The example above has a problem: it may happen that the vehicles wants to park for a shorter duration than it takes to charge the battery (this is the case if both, charging and parking would take 60 to 90 minutes).

To fix this problem we have to slightly change our model. A new bat_ctrl() will be started every time the EV starts parking. The EV then waits until the parking duration is over and until the charging has stopped:

>>> class EV:
...     def __init__(self, env):
...         self.env = env
...         self.drive_proc = env.process(self.drive(env))
...
...     def drive(self, env):
...         while True:
...             # Drive for 20-40 min
...             yield env.timeout(randint(20, 40))
...
...             # Park for 1–6 hours
...             print('Start parking at', env.now)
...             charging = env.process(self.bat_ctrl(env))
...             parking = env.timeout(randint(60, 360))
...             yield charging & parking
...             print('Stop parking at', env.now)
...
...     def bat_ctrl(self, env):
...         print('Bat. ctrl. started at', env.now)
...         # Intelligent charging behavior here …
...         yield env.timeout(randint(30, 90))
...         print('Bat. ctrl. done at', env.now)
...
>>> env = simpy.Environment()
>>> ev = EV(env)
>>> env.run(until=310)
Start parking at 29
Bat. ctrl. started at 29
Bat. ctrl. done at 83
Stop parking at 305

Again, nothing new (if you’ve read the events guide) and special is happening. SimPy processes are events, too, so you can yield them and will thus wait for them to get triggered. You can also wait for two events at the same time by concatenating them with &.

Interrupting another process

As usual, we now have another problem: Imagine, a trip is very urgent, but with the current implementation, we always need to wait until the battery is fully charged. If we could somehow interrupt that …

Fortunate coincidence, there is indeed a way to do exactly this. You can call interrupt() on a Process. This will throw an Interrupt exception into that process, resuming it immediately:

>>> class EV:
...     def __init__(self, env):
...         self.env = env
...         self.drive_proc = env.process(self.drive(env))
...
...     def drive(self, env):
...         while True:
...             # Drive for 20-40 min
...             yield env.timeout(randint(20, 40))
...
...             # Park for 1 hour
...             print('Start parking at', env.now)
...             charging = env.process(self.bat_ctrl(env))
...             parking = env.timeout(60)
...             yield charging | parking
...             if not charging.triggered:
...                 # Interrupt charging if not already done.
...                 charging.interrupt('Need to go!')
...             print('Stop parking at', env.now)
...
...     def bat_ctrl(self, env):
...         print('Bat. ctrl. started at', env.now)
...         try:
...             yield env.timeout(randint(60, 90))
...             print('Bat. ctrl. done at', env.now)
...         except simpy.Interrupt as i:
...             # Onoes! Got interrupted before the charging was done.
...             print('Bat. ctrl. interrupted at', env.now, 'msg:',
...                   i.cause)
...
>>> env = simpy.Environment()
>>> ev = EV(env)
>>> env.run(until=100)
Start parking at 31
Bat. ctrl. started at 31
Stop parking at 91
Bat. ctrl. interrupted at 91 msg: Need to go!

What process.interrupt() actually does is removing the process’ _resume() method from the callbacks of the event that it is currently waiting for. And it will schedule an event that will throw the Interrupt exception into the interrupted process as soon as possible.

Since we don’t to anything special to the event, the interrupted process can yield the same event again after catching the Interrupt – Imagine someone waiting for a shop to open. The person may get interrupted by a phone call. After finishing the call, he or she checks if the shop already opened and either enters or continues to wait.

This guide is now, of course, also part of the SimPy documentation.

Mike C. Fletcher: Enums and Output Variables

 ∗ Planet Python

Walter on PyOpenGL Users pointed me at the chromium "regal" tool, which has a table of constant definitions for output parameter sizings. I have a similar mechanism inside PyOpenGL's source code, so I parsed the table out of regal and used it to spot and plug gaps in the PyOpenGL table. The regal stuff is all-one-table for any COMPSIZE() value, while PyOpenGL has traditionally had many tables, one for each output-parameter... seems that could be simplified now to just use the one table, though there are definitely some corner-cases to be addressed (cases where e.g. another API call is required to get the proper output size). There also seem to be cases where COMPSIZE() is either wrong or weird (basically looking like it's keyed off the wrong parameter).

Anyway, the comparison is automated now, so I can pull updates into PyOpenGL's tables. I still can't get the khronos xml registry downloaded, however, so I'm blocked from updating the extensions/features for now.

[Update] and now I've largely automated the output parameter marking so that almost everything that the old .spec file said was an output parameter is now marked as such. Doesn't yet handle multiple outputs/function or cases where the COMPSIZE is complex, but should handle the static, dynamic, dynamic-with-multiple and table lookup cases.

End Point: Speeding Up Saving Millions of ORM Objects in PostgreSQL

 ∗ Planet Python

The Problem

Sometimes you need to generate sample data, like random data for tests. Sometimes you need to generate it with huge amount of code you have in your ORM mappings, just because an architect decided that all the logic needs to be stored in the ORM, and the database should be just a dummy data container. The real reason is not important - the problem is: let’s generate lots of, millions of rows, for a sample table from ORM mappings.

Sometimes the data is read from a file, but due to business logic kept in ORM, you need to load the data from file to ORM and then save the millions of ORM objects to database.

This can be done in many different ways, but here I will concentrate on making that as fast as possible.

I will use PostgreSQL and SQLAlchemy (with psycopg2) for ORM, so all the code will be implemented in Python. I will create a couple of functions, each implementing another solution for saving the data to the database, and I will test them using 10k and 100k of generated ORM objects.

Sample Table

The table I used is quite simple, just a simplified blog post:

CREATE TABLE posts (
  id SERIAL PRIMARY KEY,
  title TEXT NOT NULL,
  body TEXT NOT NULL,
  payload TEXT NOT NULL
);

SQLAlchemy Mapping

I'm using SQLAlchemy for ORM, so I need a mapping, I will use this simple one:
class BlogPost(Base):
    __tablename__ = "posts"

    id = Column(Integer, primary_key=True)
    title = Column(Text)
    body = Column(Text)
    payload = Column(Text)

The payload field is just to make the object bigger, to simulate real life where objects can be much more complicated, and thus slower to save to the database.

Generating Random Object

The main idea for this test is to have a randomly generated object, however what I really check is the database speed, and the whole randomness is used at the client side, so having a randomly generated object doesn’t really matter at this moment. The overhead of a fully random function is the same regardless of the method of saving the data to the database. So instead of randomly generating the object, I will use a static one, with static data, and I will use the function below:

TITLE   = "title"      * 1764
BODY    = "body"       * 1764
PAYLOAD = "dummy data" * 1764

def generate_random_post():
    "Generates a kind of random blog post"
    return BlogPost(title=TITLE, body=BODY, payload=PAYLOAD)

Solution Ideas

Generally there are two main ideas for such a bulk inserting of multiple ORM objects:

  • Insert them one-by-one with autocommit
  • Insert them one-by-one in one transaction

Save One By One

This is the simplest way. Usually we don’t save just one object, but instead we save many different objects in one transaction, and making a couple of related changes in multiple transactions is a great way leading to a database with bad data.

For generating millions of unrelated objects this shouldn’t cause data inconsistency, but this is highly inefficient. I’ve seen this multiple times in code: create an object, save it to the database, commit, create another object and so on. It works, but is quite slow. Sometimes it is fast enough, but for the cost of making a very simple change in this algorithm we can make it 10 times faster.

I’ve implemented this algorithm in the function below:

def save_objects_one_by_one(count=MAX_COUNT):
    for i in xrange(1, MAX_COUNT+1):
        post = generate_random_post()
        session.add(post)
        session.commit()

Save All in One Transaction

This solution is as simple as: create objects, save them to the database, commit the transaction at the end, so do everything in one huge transaction.

The implementation differs only by four spaces from the previous one, just run commit() once, after adding all objects:

def save_objects_one_transaction(count=MAX_COUNT):
    for i in xrange(1, MAX_COUNT+1):
        post = generate_random_post()
        session.add(post)
    session.commit()

Time difference

I ran the tests multiple times, truncating the table each time. The average results of saving 10k objects were quite predictable:

  • Multiple transactions - 268 seconds
  • One transaction - 25 seconds

The difference is not surprising, the whole table size is 4.8MB, but after each transaction the database needs to write the changes on disk, which slows the procedure a lot.

Copy

So far, I’ve described the most common methods of generating and storing many ORM objects. I was wondering about another, which may seem surprising a little bit at the beginning.

PostgreSQL has a great COPY command which can copy data between a table and a file. The file format is simple: one table row per one file row, fields delimited with a defined delimiter etc. It can be a normal csv or tsv file.

My crazy idea was: how about using the COPY for loading all the generated ORM objects? To do that, I need to serialize them to a text representation, to create a text file with all of them. So I created a simple function, which does that. This function is made outside the BlogPost class, so I don't need to change the data model.

def serialize_post_to_out_stream(post, out):
    import csv
    writer = csv.writer(out, delimiter="\t", quoting=csv.QUOTE_MINIMAL)
    writer.writerow([post.title, post.body, post.payload])

The function above gets two parameters:

  • post - the object to be serialized
  • out - the output stream where the row with the post object will be saved, in Python it is a file-like object, so an object with all the functions a file object has

Here I use a standard csv module, which supports reading and writing csv files. I really don’t want to write my own function for escaping all the possible forms of data I could have - this usually leads to many tricky bugs.

The only thing left is to use the COPY command. I don’t want to create a file with data and load that later; the generated data can be really huge, and creating temporary files can just slow things down. I want to keep the whole procedure in Python, and use pipes for data loading.

I will use the psql program for accessing the PostgreSQL database. Psql has a different command called \COPY, which can read the csv file from psql's standard input. This can be done using e.g.: cat file.csv | psql database.

To use it in Python, I’m going to use the subprocess module, and create a psql process with stdin=subprocess.PIPE which will give me write access to the pipe psql reads from. The function I’ve implemented is:

def save_objects_using_copy(count=MAX_COUNT):
    import subprocess
    p = subprocess.Popen([
        'psql', 'pgtest', '-U', 'pgtest',
        '-c', '\COPY posts(title, body, payload) FROM STDIN',
        '--set=ON_ERROR_STOP=true'
        ], stdin=subprocess.PIPE
    )
    for i in xrange(1, MAX_COUNT+1):
        post = generate_random_post()
        serialize_post_to_out_stream(post, p.stdin)
    p.stdin.close()

Results

I’ve also tested that on the same database table, truncating the table before running it. After that I’ve also checked this function, and the previous one (with one transaction) on a bigger sample - 100k of BlogPost objects.

The results are:

Sample size Multiple Transactions One Transaction COPY
10k 268 s 25 s 5 s
100k 262 s 51 s

I haven’t tested the multiple transactions version for 100k sample, as I just didn’t want to wait multiple hours for finishing that (as I run each of the tests multiple times to get more reliable results).

As you can see, the COPY version is the fastest, even 5 times faster than the full ORM version with one huge transaction. This version is also memory friendly, as no matter how many objects you want to generate, it always needs to store one ORM object in memory, and you can destroy it after saving.

The Drawbacks

Of course using psql poses a couple of problems:

  • you need to have psql available; sometimes that’s not an option
  • calling psql creates another connection to the database; sometimes that could be a problem
  • you need to set up a password in ~/.psql file; you cannot provide it in the command line

You could also get the pcycopg2 cursor directly from the SQLAlchemy connection, and then use the copy_from() function, but this method needs to have all the data already prepared in memory, as it reads from a file-like object, e.g. StringIO. This is not a good solution for inserting millions of objects, as they can be quite huge - streaming is much better in this case.

Another solution to this is to write a generator, which is a file like object, and the copy_from() method can read from it directly. This function calls the file's read() method trying to read 8192 bytes per call. This can be a good idea when you don't have access to the psql, however due to the overhead for generating the 8192 bytes strings, it should be slowever than the psql version.

Mike C. Fletcher: Auto-generating Output Declarations

 ∗ Planet Python

So the good news is that I've now got far more PyOpenGL output parameters automatically wrapped such that they can be passed in or automatically generated. That drops a *lot* of the manually maintained code. OpenGLContext works with the revisions, but the revision currently does *not* support non-contiguous input arrays (basically the cleanup included removing manually maintained code that said "this is an input array, so allow it to be non-contiguous"). I'll be moving the logic to doing that wrapping into the run-time eventually.

Upshot, however, is that there are a lot of changes, and while they are all bug-fixes, they are going to need to be tested (once they are finished). There's also a few hundred entry-points that can't currently be auto-wrapped, I'm intending to make the auto-wrapper more capable there when possible.