rawtech

Thursday, 24 May

16:54

Diversity is important in a workplace environment. Having different points of view from people with different life experiences brings creative new ideas and innovative solutions to the software development process. As a team of web developers that designs and builds custom web applications, creativity and gender diversity, I would argue, are closely tied and both crucial to the success of our projects.

Software development is traditionally a field dominated by men. The presence of women in programming is commanding more attention at conferences and events, particularly in recent years. There are many initiatives within the Python community to promote involvement, such as the group Women Who Code, and the long-term benefits of efforts like these cannot be overstated. At Caktus, three of the ten current members of our Django development staff are female, and we are actively working to further equalize the gender balance through future hires.

One part of closing the gender gap in programming communities involves creating a safe and inclusive environment for all parties, both in the workplace and at community events. This begins with strong employment policies on harassment — which we have implemented here at Caktus — and also extends to implementing policies at conferences or other events relevant to the development community. This blog post comes out of several recent events in the broader tech community that Caktus is a part of. This includes two blog posts that drew a lot of attention, and two corporate events that demonstrated clearly for us how pervasive sexism still is in our industry.

In March, Katie Cunningham, a fellow Django developer and friend made through DjangoCon, wrote an excellent and widely read blog post titled "Lighten Up." The post details a story that is all too common in our industry, that of ongoing, but usually subtle sexist jokes and comments that serve only, whether intentionally or not, to gradually demoralize and discourage those to whom they are addressed. She discusses how the typical response, should someone get upset by such a "joke" or comment, is simply to "lighten up" and "stop being so sensitive." I can see how that would get really old. Fast. In fact, just in reading up for writing this post, I was blown away by the number of comments in response to posts such as Katie's that fit exactly that description.

Just a couple weeks ago, the computer manufacturer Dell hosted a summit in Copenhagen. Dell hired a "comedian" to MC the event, who proceeded to entertain the attendees with a variety of sexist jokes, acclaiming the "success" of the IT community as indicated by its male-dominated culture, and included a suggestion to the mostly male attendees that they go home and tell their partners to "shut up bitch." Dell has finally put out an apology about what happened, but the consensus seems to be that it's weak at best and, given its tardiness, lost any sincerity it may once have had.

In closing, I’d like to draw attention to one other aspect of the software development community, particularly at conferences, that often tends to aggravate any pre-existing sexism that might exist in the community. Last month, Ryan Funduk wrote an extensive and much-debated blog post, titled "Our Culture of Exclusion," about the pervasiveness of drinking at many programming conferences. He also discusses how, all too often, these environments are ripe with sexist or racist jokes (or worse) that quickly erode any efforts to further diversify such communities. One example that he highlights is the daily deal API provider Sqoot, who alienated and offended many when they suggested that women are better fit to serve beer than to program. Sqoot made several apologies about this unfortunate advertisement, but many of them still fell short and only worsened the offense, as this response from Gayle McDowell aptly sums up:

Your language, as far as I understand it, is making the assumption that all coders are male, that all are straight, that women are sexual objects offered as a reward (HIGHLY inappropriate in a professional context), and that the women are there to serve them.

She goes on to suggest an appropriate response if and when events like this do happen:

People screw up sometimes. It happens. A TON of people (both men and women) have done equally offensive things. // But when you do it, you need to own up to it and think about what you've done. Not lie about it, as you're doing now.

Clearly, we need to have policies in place for handling instances of sexist or racist jokes and sexual harassment. When they do happen, McDowell says, one needs to take responsibility and own up to the mistakes one has made in an honest way.

The bar has been set, and I think it’s worth noting that several lessons can be taken from these events and applied to our own communities. Caktus is a long-time sponsor of several conferences relevant to the work that we do. Along with this blog post, Caktus is asking conference organizers and other sponsors to join us in the following effort: Moving forward, Caktus will require that a zero-tolerance sexual harassment policy is established and enforced by the organizers of any conference that we sponsor or attend. We want to ensure that our community events are safe, welcoming, and supportive for all of our colleagues — both male and female. While drinking can certainly make things worse, it is not the root of the problem. To overcome these issues, we need to talk about them more, continue to raise awareness, and treat all people in our workplaces and communities with the professionalism and respect that they deserve.

We were dismayed to learn of the federal government’s April 30th decision to severely cut the budget of Library and Archives Canada and to eliminate the National Archival Development Program and the body that administers it, the Canadian Council of Archives.

NADP funding has helped the Archives describe and digitize many photographs like this one from the Jack Lindsay Photographers Ltd fonds. Reference Code AM1184-S1-: CVA 1184-3225

The CCA supports a network of archival institutions and advisory services across the country and has played an integral part in the development of Canadian archival standards and in the support of the development of all the provincial and territorial online archival databases across the country (BC’s is MemoryBC). It brings the latter together under Archives Canada, the national database of archival descriptions. These databases were built on and are supported by NADP funding, as are numerous archival and preservation advisory services in the provinces and territories.

 At the City of Vancouver Archives, we have benefited from NADP funding since the mid-1980s, and a lot of the content you see online is there, in one way or another, because of an NADP-funded project. For example, we’ve used NADP funds to: 

  • Create thousands of descriptions of records in all formats—textual records, photographs, maps, movies–in our database for the likes of the Townley, Matheson and Partners fonds, the Junior League of Greater Vancouver fonds, the Gordon Price fonds, the (Mayor) LD Taylor family fonds and the Yaletown Productions Ltd fonds
  • Describing and digitizing major photographic collections and fonds such as the Major Matthews Collection, the Steffens-Colmer collection and Jack Lindsay Ltd Photographers fonds
  • Clean and/or rehouse glass plate and acetate negatives
  • Purchase supplies for housing acetate negatives in cold storage
  • Purchase digitization equipment
  • Develop the standards and procedures for our audio digitization program and digitize audio recordings
  • Develop the standards and procedures for digitizing our moving image holdings and digitizing them.

We were counting on funds from the NADP this year to help process the BC Sugar fonds. With the elimination of the program, we will not be able to make these records available as soon as we would like to.

For more information about the impact of the cuts on the archival community and users of archives please visit the CCA’s Facebook page.

16:00

Please don't get me wrong, I hate Microsoft and Windows as much as the next OSS or Python web developer. I just feel it's important to be honest with ourselves about why they're bad. I grow a little tired of all the griping sessions whenever the sorry state of Plone's Windows installer support is discussed. I think there's plenty to gripe about and there's plenty that Microsoft does poorly but lets also be honest with ourselves. If Microsoft didn't do some things really well they wouldn't be a problem for us and we wouldn't even be discussing whether to support them as a platform.

At any rate, the discussions about the Windows installer at the 2012 Cioppino Sprint were as frustrating and disappointing as ever. When everyone was through griping, however, I couldn't find anyone who at the end of that discussion thought we could afford strategically to drop the Windows installer or leave it to a 3rd party.

I've started putting Microsoft's Web Platform Installer, Web Matrix, and IIS Express together for Plone 4.2 and Python 2.7 and have a proof-of-concept working on my Windows VM that actually gives us Open Source all the way up to IIS, see the screenshot above. Plone is actually really snappy running this way even on my very slow Windows VM.

Below I document what I've learned as I proceeded and my final findings. If you read nothing else please read the Help Needed! section and let me know if you or anyone else can help me get this to a place where others can try it out. I'll be sprinting on this at the post-PSE sprints so anyone who has IIS or Windows installer experience, I'd love to sprint with you on this.

Help Needed!

Primarily, I need help with things that I can't do with just Web Deploy and WebPI. For example, the way IIS does FastCGI means that a configuration change has to be made to the global IIS config for each FastCGI app. IOW, it can't be done in the app/deployment-local web.config file. I know the shell command to run to make the change, but I don't know how to package that properly so that it runs with escalated privileges when necessary.

Similarly, we need to figure out how to handle ZEO in the way that is closest to "correct" for IIS. Should we just install an autostart service? IIS have a database provisioning and control framework. Can that be adapted to also manage ZODB databases and control running ZEO processes? Or should we just wrap it up in such a way that ZEO is started the first, and only the first time, IIS launches the IIS process and shuts it down when stopping the app? If so, how?

Since we're using WSGI via FastCGI using Flup, we're dependent on Zope's WSGI server. Unfortunately it lacks the publication hooks used by things like plone.app.theming and plone.app.caching. This is really a bug in the Zope2 WSGI publisher and as such affects all WSGI deployments, not just Windows. @Hanno and @davisagli think they'll be able to get to this on in the next few days.

Finally, if I'm wrong about any of the technical stuff in here, I'd love to hear about it. The documentation is crap for all this stuff and it has been way too hard to figure it all out, so I'd love any leg up I can get.

Supporting Windows

One thing that came out of the discussion, that I thought was interesting, was the sentiment that if we were going to support Windows we shouldn't do it while telling Windows users to take a leap at the same time. IOW, we say we support windows out of one side of our mouths, and blame users for choosing windows out the other side of our mouths when they encounter fundamental problems or find them selves encountering an unfamiliar learning curve. I see two things behind such problems: lack of integration with typical Windows tool chains, and differences in the documentation.

The differences in the documentation largely come from the fact that the current windows installer isn't based off the unified installer so the buildout/package layout is different than what's in most of our docs. To that end, I started looking into what it would take to build a Windows installer from the unified installer.

The lack of integration with typical Windows tool chains is a much bigger issue and much of that issue is a problem for most Python web applications, not just Plone. My research did, however, turn up some very promising new tools from Microsoft that we may be able to use to provide a better experience for Windows Plone developers and Windows Plone deployments.

There are also two major audiences that tend to make use of Plone's Windows support: developers and deployments. Steve McMahon believes that most of the Windows downloads are by developers looking to get started doing Plone integrations or custom Plone development. Another target the installer has been used for, and this is more of what I've seen in my experience as a developer, is as a basis for production Plone deployments running on Windows servers. I definitely trust Steve's sense more than my own, but the good news is that this new Windows tool chain seeks to provide a nicely integrated experience from web app development through to web app deployment.

So for the latter half of the sprint and ever since then, I've been obsessed with the Web Platform Installer (WebPI or WPI), IISExpress, and Web Matrix tool chain. The WebPI is actually a fairly open framework for describing web frameworks and web apps including dependencies and arbitrary installation commands in an extended atom feed. Helicon uses this to provide a Django install story nicely integrated with this Windows tool chain. Using IISExpress and Web Matrix also allows developers to work in a local environment isolated to their user directory without needing to have the full IIS ($$$) installed.

Notes and Findings

Below is a loosely organized grab-bag of notes and findings I recorded while working on this. I publish it here only for reference.

Hosting

  • ISAPI-WSGI doesn't support IISExpress

    The ISAPI-WSGI OSS project is used by a number of Python WSGI projects to support IIS. ISAPI-WSGI depends on py2win32 which in turn depends on IIS 6 or the IIS 7 plugin providing IIS 6 compatibility. The IIS 6 compatibility plugin isn't supported on IISExpress.

  • IIS only supports FastCGI

    This is how the WebPI and Web Matrix tool chain support PHP apps or other apps, such as plone, that need to have long running separate processes to run efficiently. This FCGI support is also restricted to using Windows named pipes. IOW, no TCP sockets to a separately running FCGI server process.

  • No current FCGI to WSGI gateway works with Windows

    The FastCGI spec calls for the STDIN_FILENO to refer to a socket which is then used for two-way communication between the server and the process handling requests. Naturally, Microsoft has embraced and extended this standard in IIS such that instead of a single socket it uses two Windows Named Pipes one each for receiving and sending. IIS may also support TCP sockets behind the scenes. This means that anything that expects to use normal sockets for FCGI, like flup won't work with IIS. I can find no other OSS FCGI to WSGI gateway that supports Windows named pipes.

  • IIRF doesn't support IISExpress

    We might be able to use Ionic's Isapi Rewrite Filter to proxy IIS requests through to a separately running Python process. This is less then ideal since it may make the Web Matrix experience less integrated and requiring more un-Windows-like knowledge. IIRF doesn't support IISExpress, at any rate, though it may be possible to manually install it into IISExpress.

  • Helicon Zoo Module

    I suppose the lack of other working options is exactly why Helicon Tech built it's own solution for this. I prefer to have OSS all the way up to IIS itself, but that's difficult when you play in the Microsoft sandbox. At least it looks like Helicon has paid some attention to performance. Furthermore, having company support may yield better long term maintenance for IIS support than an OSS project in the Microsoft universe. That still doesn't mean I like it.

    Part of the Helicon Zoo Module is a zoofcgi.py script which is their own FCGI-WSGI gateway that seems to use STDIN_FILENO instead of sockets to do the FCGI communication. IOW, Helicon has written a totally new FCGI-WSGI gateway that works with IIS's broken FCGI implementation. In their implementation I see a lot that looks familiar from flup.

  • Modifying zoofcgi.py to run Plone

    Unfortunately, Helicon's zoofcgi.py only supports a Django WSGI app or a example wsgi app, with no way that I saw to use it to run an arbitrary WSGI app. Replace the run_example_app() function with the following to enable loading of an arbitrary WSGI app from a Paste *.ini file:

    from paste.script.util.logging_config import fileConfig
    from paste.deploy import loadapp
    def run_example_app():
        config = os.environ.get('WSGI_CONFIG_FILE')
        if config:
            config = os.path.abspath( config )
            fileConfig(config)
            application = loadapp('config:%s'%(config,))
        else:
            application = example_application
        if __debug__: logging.info('run_fcgi: STARTED')
        FCGIServer(application).run()
        if __debug__: logging.info('run_fcgi: EXITED')
    

    In ~My DocumentsIISExpressconfigapplicationhost.config change <engine name="python.2.7.pipe"... to:

    <engine name="python.2.7.pipe"
            fullPath="%SystemDrive%\Plone42\zeocluster\bin\zopeskelpy.exe"
            arguments="%SystemDrive%\ZooExpress\Workers\python\zoofcgi.py"
            transport="NamedPipe"
            protocol="fastcgi" />
    

    web.config in the site root:

    <?xml version="1.0" encoding="UTF-8"?>
    <configuration>
      <system.webServer>
        <heliconZoo>
          <application name="wsgi.project" >
            <environmentVariables>
              <add name="WSGI_CONFIG_FILE" value="%APPL_PHYSICAL_PATH%\wsgi.ini" />
      </environmentVariables>
          </application>
        </heliconZoo>
        <handlers>
          <add name="wsgi.project"
               scriptProcessor="python.2.7.pipe"
               path="*"
               verb="*"
               modules="HeliconZoo_x86" />
        </handlers>
      </system.webServer>
    </configuration>
    

    It should be possible to re-use zoofcgi.py without modifying it by making it importable, IOW putting it's dir on sys.path. Ideally, I'd like to see instead is a library that can wrap the named pipes provided by IIS such that they emulate a normal socket.

  • iisfcgi and filesocket

    I implemented a crude socket-like implementation called filesocket that wraps two open files and acts like a socket. Then I copied and pasted all the bits from flup that depend too rigidly on actual socket or on things not available in windows and adapted them as needed. Finally, I wrapped all that up with the necessary paste.deploy bits so that IIS can launch Plone as a WSGI app, after adding bits from the WSGI buildout to the unified installer, from a Paste ini file, using the paster config without a server since we're interfacing directly with IIS. Finally, add the following to configuration/system.webServer/fastCgi in ~\My Documents\IISExpress\config\applicationhost.config:

    <application fullPath="C:\Python27\python.exe"
               arguments="-u C:\Plone42\zeocluster\bin\iisfcgi-script.py -c C:\Plone42\zeocluster\production.ini"
               monitorChangesTo="C:\Plone42\zeocluster\production.ini"
               maxInstances="1">
    </application>
    

    This is one of the things I need help with. This can be done using the IIS appcmd.exe program but I need to know how to do this during the Web PI install process.

    It works and now we have Open Source all the way up to IIS!

  • Zope namespace URLs

    Zope has a lot of special URL structures like ++resource++foo.css but IIS chokes on these. Add the following to web.config:

    <system.webServer>
      <security>
        <requestFiltering
            allowDoubleEscaping="true">
        </requestFiltering>
      </security>
    ...
    
  • WebMatrix not launching instance until it's been run in the foreground

    I'm noticing that the Plone instance keeps restarting. Before I fixed the pluses in URL issue, it at least eventually got running stably after several restarts, but now it's restarting much more frequently. I thought it was caused by putting the resource registries in development mode because it means more requests and the number of requests that IIS lets build up before restarting the FCGI process is reached. When Plone is warming up, almost all of these request pile up. But I've also seen it be perfectly stable with the resource registries development mode on so I'll need to catch this in the act again to debug.

    Since then, I think the problem is that for some reason it needs to be started in foreground mode before WebMatrix/IISExpress can successfully start it.

  • How to restart in debug-mode? Logging?

    What is the best way to make debugging information accessible to the MS toolchain? What's the best way give WebMatrix developers and IIS admins to restart the instance in foreground-mode and/or debug-mode? What about giving them more integrated access to the logs? Should the default buildout configuration of the unified installer for Windows use event log handlers?

    Currently, the error reporting out of WebMatrix/IISExpress when launching the instance or anything else fails is miserable. Is there something we can do to better integrate with it's trace logs such that more Zope/Plone/Console/UNIXy specific output is reflected somewhere?

  • Fixed two Windows bugs

    Docutils 0.7 conflicts with PIL causing:

    AccessInit: hash collision: 3 for both 1 and 1
    

    Upgrading to Docutils 0.9 fixes this.

    Also contributed a fix to Zope2 that addresses many of the stale lock file problems on windows.

Packaging

  • Creating a Web Deploy Package

    Microsoft provides some docs for creating Web Deploy packages. It may also be possible to use the msdeploy tool to make a package in a more automated way.

  • Bootstrapping the Unified Installer

    The first time building the Web Deploy package based on the Unified Installer, some things need to be installed and configured that can't really be automated. Some of these steps shouldn't be necessary in the long run, since it should be possible to use the existing Web PI feed to install the dependencies now that the feed is working. Much of this should also be added to the Unified Installer in some sort of platform specific way. But if the Web PI feed ever needs to be created anew, or maybe when switching Python versions, it may be necessary to do the same things I had to do on my Windows VM to begin creating the Web PI feed and Plone Web Deploy package. I'm documenting all that in a README in the Unified Installer.

  • Non-buildout root Web Deploy Package?

    For my proof-of-concept, I manually created a web.config file in the zeocluster folder of the buildout created by the Unified Installer and then manually added that as a "site" in WebMatrix. Eventually, we need to figure out how to make the Unified Installer buildout root also be the root of the site for WebMatrix/IISExpress/IIS or how to have a web deploy package where the root of the installed site is a subdirectory of the web deploy package.

  • Deploying to a path with spaces

    I ran into the following error when the buildout was at a location with spaces in the path:

    WindowsError: [Error 87] The parameter is incorrect
    

    To try to narrow down the issue, I used buildout to create a debug-mode only script with the following part:

    [debug]
    recipe = zc.recipe.egg
    eggs = ${instance:eggs}
    entry-points = debug=Zope2.Startup.run:run
    initialization =
        import sys
        sys.argv.extend(["-C", r"${instance:location}\etc\zope.conf",
                         "-X", "debug-mode=on"])
    

    The script this produces works just fine, so the problem is in the plone.recipe.zope2instance recipe.

Installing

  • Writing the Web Platform Installer Atom Feed

    Microsoft provides a reference to what elements and attributes it adds to the ATOM namespace.

  • bdist_wininst Python Installers aren't silent

    Windows MSI installers seem to support a /verysilent flag. Unfortunatley, the Python distutils support for building Windows installers has no silent option. A workaround may be to use Web PI's support for arbitrary installer commands to extract the installer without running it. It may also work to convert the wininst packages into MSI packages.

    In short, we need MSI's for PIL, and pywin32, and binary windows eggs for lxml.

    Researching free software to build MSI's. We need custom actions support, so Advanced Installer won't do.

  • Get the SHA1 Hash of the Web Deploy Package

    Microsoft provides a tool for generating this hash. I'm not sure if using this tool is strictly necessary or if there may be a way to get the msdeploy tool to do this as a part of a more automated packaging process.

  • Web App Gallery

    Microsoft actually has a web application gallery that they say applications can submit their application to. If approved, these applications would be available in Web PI without requiring the user to enter a custom feed.

14:54

Presentación

IV Dia Mundial de Plone, Venezuela 2011

El Grupo Pythonistas venezolanos (PyVE) y el grupo local Plone Venezuela, tiene el agrado de invitar al público en general a el “V Dia Mundial de Plone, Venezuela 2012”.

Lugar: A distancia a través de la Plataforma de vídeo conferencia de la Comunidad Canaima GNU/Linux.

Fecha: 22 de Mayo de 2012

Horario: 9 a.m. a 6 p.m. GTM-04:30 Caracas, Venezuela

Objetivos del evento:

  • Difundir el proyecto Plone y sus aportes a las tecnologías libres del mundo.
  • Promover el acercamiento entre los diferentes actores que trabajan en el área de desarrollo e investigación de Tecnologías Libres.
  • Fomentar y fortalecer la generación de productos y servicios en materia de Tecnologías Libres en la sociedad venezolana.
  • Promover la Inclusión y Participación del Poder popular en la construcción de la soberanía y la independencia tecnológica.

Dirigido a:

  • Estudiantes interesados en el área de las tecnologías libres.
  • Especialistas en software libre.
  • Emprendedores en tecnologías libres.
  • Sector productivo de la Industria Nacional de Tecnologías Libres (Cooperativas, EPS, MIPYMEs).
  • Instituciones públicas con interés en migración y transformación hacia las tecnologías libres.
  • Empresas públicas y privadas con interés en migración y transformación hacia las tecnologías libres.
  • Público en general.

Programa

  • ¿Por que Plone es Chevere? por Flamel Canto y José Subero, Consultores informáticos. 09:00:00 GTM-04:30
  • Experiencias de usar Plone CMS como herramienta para la publicación de contenidos metodológicos por Kiberley Santos. 11:30:00 GTM-04:30
  • Desarrollo de sistemas de publicación de noticias y contenidos multimedia Web para Medios de Comunicación con Plone en la cadena de noticias teleSUR por Leonardo Caballero, Consultor informático 14:30:00 GTM-02:30
  • Desarrollo de aplicaciones Web con Base de Datos Relacional en Plone CMS por Victor Teran y Leonardo Caballero, Consultores informáticos. 17:00:00 GTM-04:30

Organizador:

  • Plone Venezuela.

“Leonardo Caballero” <lcaballero@cenditel.gob.ve>

Patrocinantes:

Modos de Contacto

Inscripciones: Son dos simples pasos

  1. Registrarse en el sistema de Eventos
    http://eventos.solve.web.ve/accounts/register/
  2. Activar su cuenta de usuario enviada a su correo. Verifica que tu activación no este en correos no deseados.
  3. Llenar su datos del perfil de usuario
    http://eventos.solve.web.ve/profiles/edit/
  4. Hacer clic en el siguiente enlace http://eventos.solve.web.ve/suscriptor/registro/7/
  5. Crear usuario en la plataforma http://envivo.canaima.softwarelibre.gob.ve/ haciendo clic en el botón “Registrar” llenando sus datos de usuario.
  6. Para finalizar accede con tu usuario y contraseña a la sala de chat llamada “Plone Venezuela”.

Este evento es gratuito y tiene certificado de asistencia electrónico avalado por la fundación Plone Venezuela y Armadillo Integración Tecnológica C.A.

Cronograma de actividades:

http://plone.org/events/wpd/2012/hosts/caracas-venezuela


The Pre-PSE12-Strategicesque-Sprintacular was held in State College, PA in the days preceding Plone Symposium East, 2012.


This sprint was different from most. Typically sprints are a very active, very vocal event as sprinters debate and decide how a new feature should be implemented. The Sprintacular’s goals were centered around code cleanup and removal, which resulted in a very quiet and highly focused environment.

Here’s what happened…

Hanno Schlichting:

  • Removed zope.app.* dependencies in Plone core.
  • Investigated upgrading Plone to CMF 2.3. Integrating membership tool into Plone, ran into issues. Changes in tool registration makes upgrading problematic in a 4.x release.

David Glick:

  • Refactored Archetypes-specific code in preparation for Plone’s future switch to Dexterity.
  • Moved several portal_scripts Python scripts to browser views.
  • Helped Hanno with zope.app.* cleanup.
  • Removed around 30 total packages as dependencies.

Craig Haynal, Eric Steele, Nathan van Gheem:

  • Worked on removal of KSS from Plone core.
  • Existing KSS scripts migrated to use jQuery instead.
  • Remove KSS as a dependency of Plone.

Joel Kleier:

  • PEP8, PyFlakes clean up of Products.CMFPlone, Products.PlonePAS, plone.outputfilters, and Products.PloneTestCase

Ross Patterson, Ed Manlove, Michael Mulich

  • Created wrappers for older testing frameworks to ease their move to plone.app.testing.
  • Investigated conversion operations for older tests, but none have panned out.
  • Began work on a testcase migration guide.
  • Migrated PloneTestCase tests in Products.CMFPlone to use plone.app.testing.

Many thanks to all who attended!

13:45

ISC Feature of the Week: Country Report

Overview As promised in the Data/Reports Feature Diary, this week we will cover the Country Report ...(more)...

13:00

Robin Dunn, creator and mastermind behind wxPython, announced today on his blog and the wxPython-dev mailing list that he had gotten wxPython 2.9 (Phoenix) to build successfully for Python 3.2 on Mac. In fact, he posted a Quicktime video that shows the build and the tests running in Python 3! According to wxPython-dev, once they have some Python 3 buildbot slaves set up, then snapshot builds can be made and posted here.

I’m pretty excited! Now if only the Python Imaging Library would convert too…

10:54



A short screencast of the Diazo theme editor, slated for Plone 4.3. The theme editor allows mapping of Plone content into any HTML/CSS theme, HTML, CSS and Diazo ruleset editing, and theme package export.

Massive kudos to Martin Aspeli for yet another amazing Plone feature.

Watch as the Phoenix spreads her wings over Python 3:

http://wxpython.org/Phoenix/ItsAlive/

We are glad to announce that the registration for the second PyCon DE in Leipzig is open. You can now buy tickets at the early-bird rate until end of June before prices will go up. Don't miss the opportunity to come the larges meeting of the German-speaking Python community and secure your ticket now.

If you plan not only to come but also to contribute, you can submit a proposal for a talk or a tutorial. A wide variety of Python-related topics are welcome. More details in an earlier post.

The second PyCon DE will be in Leipzig from October 29 through November 3, 2012. One tutorial day, three days with talks and two days with a barcamp, code retreat  and sprints will provide different ways to communicate about Python. There will be social events to give everybody ample opportunity to network with like-minded Pythonistas.

The Windows build of Python 3.3 has recently seen changes that could use a look from the community throughout our alpha and beta cycle. The first change is the long requested addition of Python to the system Path variable, which was completed in the installer. Secondly, the build was upgraded to Visual Studio 2010.

Python on the Path

A long requested feature, especially from beginners to those involved in education and training, has been the ability for the Python installer to place itself in the system Path environment variable. Having the following message appear when you try to run a simple exercise is not a great first experience:
'python' is not recognized as an internal or external command, operable program or batch file.
Because of that, the first post-install step by many users is to edit the Path environment variable manually to insert the C:Python33 directory. This allows the user to simply type python on the command line and have it open C:\\Python33\\python.exe -- a very desirable feature for a majority of users. In fact, it's such a common post-install step that there are a huge amount of tutorials either about this step by itself or tutorials where their setup introduces this step before moving on.
http://i.imgur.com/aixuY.png
The easiest part of the whole thing was the code. Path manipulation in the installer consists of adding a new feature to the Feature table, then the Environment table may be updated based on selection of the Path feature. If the feature was selected, the Environment table is modified in a way that the Path is prepended to and will be correctly cleaned up on uninstallation.

The harder part was deciding how to go about the change. If you're going to provide Path manipulation, the major questions are to do it by default or not, and to prepend or append to the Path.

We decided that it wasn't appropriate to make this a default feature. For one, in the dual-version state many users are running in, we run the risk of users running through the installer and putting their system into a state they aren't prepared for. We don't want to change the meaning of python when executed on the command line without the user asking for it. On one hand it's a very beginner focused feature in that it gets a first-timer successfully up and running with ease. However, it's also an advanced feature in that it takes a good understanding of what it's going to do to the users who have 2.6, 2.7, 3.2, and now 3.3 on their machines. We think the best solution for all is to leave it up to them and include an explanation.

The other part we had to think about was whether to prepend or append to the path. While some believe that appending to the path is the more friendly way to work with the system, it would seem to be of limited utility given that the feature is added this late in the game. Instead we went the route of prepending the installation folder, e.g., C:\Python33, in order to make sure this feature is actually useful to our users.

If you have questions or comments, please feel free to raise them on python-dev or see Issue 3561.

Transition to Visual Studio 2010

In time for the last alpha release, we've updated our build tools from Visual Studio 2008 to 2010.
Many potential contributors as well as general Python users have long moved to work environments that use Visual Studio 2010. During a "bug day" some months ago, we had two or three patches come from interested first-timers who found our VS2008 solution not working in VS2010. Over time we received a few more contributions and bug reports on the topic, as well as some chatter in IRC about being behind the curve.

On top of that, my employer at the time moved to VS2010 as well as the employers of at least one other core maintainer, so we were already operating on ports for our companies.

When it came time to think about what to do for Python 3.3, moving to VS2010 became a must have due to our release schedule. Staying with VS2008 for 3.3 would put us into the middle of 2014 as the next time we could release on a new version. That would leave us at least two versions behind, with VS2010 as well as VS11 being available by then.

Another reason is the relative ease of porting between VS2010 and VS11. Once we got ourselves on to 2010, moving on to 11 would not be that hard. VS11 currently reads our VS2010 files without change if you want to use the IDE features of VS11. However, there'd need to be another port in order to use the VS11 compiler suite, but it seems to require minimal effort. Just following the VS11 wizard produced a functioning executable, although it didn't build cleanly.

Where to get Visual Studio 2010?
As usual, Microsoft provides a zero-cost version of Visual Studio 2010 in the name Visual C++ Express, available at http://www.microsoft.com/visualstudio/en-us/products/2010-editions/visual-cpp-express. While there are some differences between the Express version and the for-purchase versions, the Express version is used successfully by many contributors.

The fine folks at Microsoft's Open Source Technology Center have provided the core contributors with MSDN licenses free of charge, allowing for access to the full versions of Visual Studio among other products. The full versions of Visual Studio support 64-bit compilation which comes in handy for our amd64 releases, which have been available since 2.5.

Help us out -- try the alphas and betas!

With a change to the installer, a new build system, and the other great changes we have in store, the more feedback we hear from the community during the development cycle, the better we can make this release. If you have a chance to run your projects on Python 3.3, http://bugs.python.org is always open for your reports. You've even got a month to get feature requests in and completed!

The last alpha release is scheduled for this weekend, and the first beta release is scheduled for June 24. You can download our 3.3.0 releases at http://www.python.org/download/releases/3.3.0/.

Today we have for you not just one, but two exciting EPD releases — an update of EPD to 7.3 and a beta release previewing new features coming in EPD 8.0. The EPD 7.3 update adds several new packages including Shapely, openpyxl, and a new package from Enthought named Enaml.  Enaml is a new package for [...]

07:54

Examing a recent crash case, I stumbled across this code in frameobject.c:

PyFrameObject *
PyFrame_New(PyThreadState *tstate, PyCodeObject *code, PyObject *globals,
PyObject *locals)
...
if (code->co_zombieframe != NULL) {
f = code->co_zombieframe;
code->co_zombieframe = NULL;
_Py_NewReference((PyObject *)f);
assert(f->f_code == code);
}

Intrigued by the name, I examined the header where it is defined, code.h:

...
void *co_zombieframe; /* for optimization only (see frameobject.c) */
...
} PyCodeObject;

It turns out that for every PyCodeObject object that has been executed, a PyFrameObject of a suitable size is cached and kept with the code object. Now, caching is fine and good, but this cache is unbounded. Every code object has the potential to hang on to a frame, which may then never be released.
Further, there is a separate freelist cache for PyFrameObjects already, in case a frame is not found on the code object:

if (free_list == NULL) {
f = PyObject_GC_NewVar(PyFrameObject, &amp;PyFrame_Type,
extras);
if (f == NULL) {
Py_DECREF(builtins);
return NULL;
}
}
else {
assert(numfree > 0);
--numfree;
f = free_list;
free_list = free_list->f_back;
...

Always concious about memory these days, I tried disabling this in version 3.3 and running the pybench test. I was not able to see any conclusive difference in execution speed.

Update:

Disabling the zombieframe on the PS3 shaved off some 50k on startup.  Not the jackpot, but still, small things add up.

——————————————————————————-
PYBENCH 2.1
——————————————————————————-
* using CPython 3.3.0a3+ (default, May 23 2012, 20:02:34) [MSC v.1600 64 bit (AMD64)]
* disabled garbage collection
* system check interval set to maximum: 2147483647
* using timer: time.perf_counter
* timer: resolution=2.9680909446810176e-07, implementation=QueryPerformanceCounter()

——————————————————————————-
Benchmark: nozombie
——————————————————————————-

Rounds: 10
Warp: 10
Timer: time.perf_counter

Machine Details:
Platform ID: Windows-7-6.1.7601-SP1
Processor: Intel64 Family 6 Model 26 Stepping 5, GenuineIntel

Python:
Implementation: CPython
Executable: D:pydevhgcpython2pcbuildamd64python.exe
Version: 3.3.0a3+
Compiler: MSC v.1600 64 bit (AMD64)
Bits: 64bit
Build: May 23 2012 20:02:34 (#default)
Unicode: UCS4

——————————————————————————-
Comparing with: zombie
——————————————————————————-

Rounds: 10
Warp: 10
Timer: time.perf_counter

Machine Details:
Platform ID: Windows-7-6.1.7601-SP1
Processor: Intel64 Family 6 Model 26 Stepping 5, GenuineIntel

Python:
Implementation: CPython
Executable: D:pydevhgcpython2pcbuildamd64python.exe
Version: 3.3.0a3+
Compiler: MSC v.1600 64 bit (AMD64)
Bits: 64bit
Build: May 23 2012 20:00:42 (#default)
Unicode: UCS4

Test minimum run-time average run-time
this other diff this other diff
——————————————————————————-
BuiltinFunctionCalls: 51ms 52ms -3.3% 52ms 53ms -2.0%
BuiltinMethodLookup: 33ms 33ms +0.0% 34ms 34ms +0.8%
CompareFloats: 50ms 50ms +0.1% 50ms 50ms +0.4%
CompareFloatsIntegers: 99ms 98ms +0.8% 99ms 99ms +0.6%
CompareIntegers: 77ms 77ms -0.5% 77ms 77ms -0.3%
CompareInternedStrings: 60ms 60ms +0.0% 61ms 61ms -0.1%
CompareLongs: 46ms 45ms +1.5% 46ms 45ms +1.2%
CompareStrings: 61ms 59ms +3.6% 61ms 59ms +3.6%
ComplexPythonFunctionCalls: 60ms 58ms +3.3% 60ms 58ms +3.2%
ConcatStrings: 48ms 47ms +2.4% 48ms 47ms +2.1%
CreateInstances: 58ms 57ms +1.3% 59ms 58ms +1.3%
CreateNewInstances: 43ms 43ms +1.1% 44ms 44ms +1.1%
CreateStringsWithConcat: 79ms 79ms -0.3% 79ms 79ms -0.1%
DictCreation: 71ms 71ms +0.4% 72ms 72ms +1.0%
DictWithFloatKeys: 72ms 70ms +2.1% 72ms 71ms +1.8%
DictWithIntegerKeys: 46ms 46ms +0.7% 46ms 46ms +0.4%
DictWithStringKeys: 41ms 41ms +0.0% 41ms 41ms -0.1%
ForLoops: 35ms 37ms -4.0% 35ms 37ms -4.0%
IfThenElse: 64ms 64ms -0.1% 64ms 64ms -0.4%
ListSlicing: 49ms 50ms -1.0% 53ms 53ms -0.8%
NestedForLoops: 54ms 51ms +6.7% 55ms 51ms +6.7%
NestedListComprehensions: 54ms 54ms -0.7% 54ms 55ms -2.2%
NormalClassAttribute: 94ms 94ms +0.1% 94ms 94ms +0.1%
NormalInstanceAttribute: 54ms 54ms +0.3% 54ms 54ms +0.2%
PythonFunctionCalls: 58ms 57ms +0.8% 58ms 58ms +0.6%
PythonMethodCalls: 65ms 61ms +6.3% 66ms 62ms +5.9%
Recursion: 84ms 85ms -1.0% 85ms 85ms -0.9%
SecondImport: 74ms 76ms -2.5% 74ms 77ms -3.5%
SecondPackageImport: 75ms 78ms -3.8% 76ms 79ms -3.9%
SecondSubmoduleImport: 163ms 169ms -3.4% 164ms 170ms -3.3%
SimpleComplexArithmetic: 43ms 43ms +1.0% 43ms 43ms +1.0%
SimpleDictManipulation: 80ms 78ms +2.2% 81ms 79ms +2.4%
SimpleFloatArithmetic: 42ms 42ms +0.1% 42ms 42ms -0.0%
SimpleIntFloatArithmetic: 52ms 53ms -1.2% 52ms 53ms -1.1%
SimpleIntegerArithmetic: 52ms 52ms -0.7% 52ms 53ms -0.8%
SimpleListComprehensions: 45ms 45ms -0.2% 45ms 45ms +0.3%
SimpleListManipulation: 44ms 46ms -4.0% 44ms 46ms -3.9%
SimpleLongArithmetic: 32ms 32ms -0.9% 32ms 32ms -0.1%
SmallLists: 58ms 57ms +1.2% 58ms 67ms -12.8%
SmallTuples: 64ms 65ms -0.5% 65ms 65ms -0.2%
SpecialClassAttribute: 148ms 149ms -0.8% 149ms 150ms -1.0%
SpecialInstanceAttribute: 54ms 54ms +0.2% 54ms 54ms +0.0%
StringMappings: 120ms 117ms +2.5% 120ms 117ms +2.5%
StringPredicates: 62ms 62ms +0.9% 62ms 62ms +1.0%
StringSlicing: 69ms 68ms +1.6% 69ms 68ms +2.1%
TryExcept: 37ms 37ms +0.0% 37ms 37ms +0.5%
TryFinally: 40ms 37ms +6.7% 40ms 37ms +6.5%
TryRaiseExcept: 19ms 20ms -1.0% 20ms 20ms -0.4%
TupleSlicing: 65ms 65ms +0.5% 66ms 65ms +1.2%
WithFinally: 57ms 56ms +1.9% 57ms 56ms +2.1%
WithRaiseExcept: 53ms 53ms +0.3% 54ms 54ms -0.8%
——————————————————————————-
Totals: 3154ms 3145ms +0.3% 3176ms 3177ms -0.0%

(this=nozombie, other=zombie)

I’m going to remove this weird, unbounded cache from the python interpreter we use on the PS3.

The Foundation wishes to thank Carl Trachte and Audrey Roy for their work in the Python community with Community Service Awards for the first quarter of 2012.

Carl has put significant effort into diversifying and supporting non-English speaking writers for the Python Wiki.

Audrey also put in a lot of time diversifying the community with her work in creating the PyLadies group as well speaking on outreach issues as numerous conferences.

On behalf of the Python community, the PSF thanks Carl and Audrey for their time and effort!

Wednesday, 23 May

22:55

In Inigo, we believe in helping out local FOSS communities and help them grow. We help out in community events where we can, present FOSS talks, and provide some platforms for local communities to grow. One of such platform is our consolidated community site infrastructure on Plone.

The system/infra and its components was originally developed for the Fedora Malaysia website, while keeping in mind to keep it generic enough so that other communities could use the same components for their own community sites. The infra is already at a usable state, and we can add new sites easily with just a few clicks.

Features in this consolidated infra are:
  • Document/Content management (Plone built-in)
  • Calendar system (powered by solgema.fullcalendar addon)
  • Conference/BarCamp system (powered by collective.conference addon, which was developed for FUDCon Kuala Lumpur 2012)
  • Blog (powered by Products.Scrawl)
  • Simple yet powerful theming engine (powered by plone.app.theming/ Diazo) - Check out Diazo, you'll love it.  Don't worry, its not plone specific.
  • OpenID and OpenID Selector - we even have Fedora's OpenID as an option in the OpenID Selector
Currently we have several sites hosted in this consolidated infra which are:
Besides those listed above, we are also planning to move some of these sites (which are also already on our infrastructure, albeit using our old, non-consolidated infra) to the current infra:
2 local university FOSS group also recently expressed their interest to have their site in our current infra, which are NUMOSS and UCTI-FOSSSIG.

Code?

The code for this Plone buildout is here : https://dev.inigo-tech.com/svn/izhar/fedoramy.site/trunk/ . The buildout kindof a messy as it was hacked together at random times without spending much thinking. WARNING: lack of docs. If you need help, ping me :-)

Help Needed

Help needed to properly refactor some of the codes and make the infra more generic and reusable for everyone. I also need help in migrating some of the old infra sites to the current infra.

For those who are based in Malaysia, we will be having a hackathon after the Python Malaysia meetup this Saturday to migrate the Python Malaysia website to the new infra.

Main tasks:
  • Setup new site for pythonmy on the current infra
  • Write diazo theme transform rules for pythonmy site.
Secondary tasks:
  • Rename fedoramy.site to a new buildout, with a more generic name
  • Get rid of apprepo.org dependencies from the buildout. Apprepo.org should be outside of this shared infra as it introduced a number of external deps thats not relevant for other sites.
Do drop by at the upcoming Python Malaysia meetup which will be at MindValley, Bangsar. Check out this page for map and time : http://www.eventbrite.com/event/3414166865 .

A rant

Do you ever get the urge to kill? How many of us cringe whenever we see these words? Lately I’ve been spending a lot of time developing pythonpackages.com, (now running on heroku!) during which time I see a lot of these kinds of packages being released:

I kid about the killing part, but seriously: this is a problem. Fortunately for us, our PyPI overloads see fit to occasionally remove these packages, and for this I am grateful. I love seeing this:

I mean it makes me dance-around-the-room happy! Ahem. But are they really all gone? Close enough. A quick crate.io search now shows only 2 packages instead of 4 pages of results:

Hallelujah! But is this the best we can do? I know that some well-meaning person wrote a book containing the example that is leading some poor, misguided souls to spam PyPI (if only the author listed the test site instead: http://testpypi.python.org/pypi). And I have to assume that this was just some terrible mistake. But do we all have to live with this mistake?

I’m asking because I honestly don’t know the answer. I remember when I started pythonpackages.com, the Deliverance documentation was being updated something like every 5 minutes (kidding again, but it was frequent enough to be annoying). After grousing about it in public, it stopped happening!

I wonder if some good natured grousing about our friends (read: enemies) the simple printers of nested lists will do the same?

20:36

A rant

Do you ever get the urge to kill? How many of us cringe whenever we see these words? Lately I’ve been spending a lot of time developing pythonpackages.com, (now running on heroku!) during which time I see a lot of these kinds of packages being released:

I kid about the killing part, but seriously: this is a problem. Fortunately for us, our PyPI overloads see fit to occasionally remove these packages, and for this I am grateful. I love seeing this:

I mean it makes me dance-around-the-room happy! Ahem. But are they really all gone? Close enough. A quick crate.io search now shows only 2 packages instead of 4 pages of results:

Hallelujah! But is this the best we can do? I know that some well-meaning person wrote a book containing the example that is leading some poor, misguided souls to spam PyPI (if only the author listed the test site instead: http://testpypi.python.org/pypi). And I have to assume that this was just some terrible mistake. But do we all have to live with this mistake?

I’m asking because I honestly don’t know the answer. I remember when I started pythonpackages.com, the Deliverance documentation was being updated something like every 5 minutes (kidding again, but it was frequent enough to be annoying). After grousing about it in public, it stopped happening!

I wonder if some good natured grousing about our friends (read: enemies) the simple printers of nested lists will do the same?


19:54

I’m a little excited today as my first patch (and first ticket even!) has been accepted. And it really didn’t take very long either. Less than 24 hours after I had submitted my first patch, I got my contribution added. I did have to submit to more variations of the patch though as my first one wasn’t quite right. I wanted to give a shout out to Brian Curtin and Eli Bendersky who helped me figure all this stuff out and made my first foray into core Python development a success. Personally, I think it would have been a success even if the patch wasn’t accepted as I still learned a lot along the way.

Things to take away from the experience:

  • Try to stay on topic! I actually found a second issue with the paragraph I was fixing in the devguide and that probably should have gone in a separate bug report.
  • Number your patches! I don’t know why I didn’t think of that, but Eli told me I should do that in the future to make it less confusing for the committer. That was a face palm moment.

I’ve been reading some of the supposedly “easy bugs” and trying to figure out where else I can help. I already spotted another typo in the docs that are included with Python itself which I’ll probably try to fix. Of course, I want to actually contribute to the code, not just the documentation, but I am probably more likely to be able to find documentation bugs I can help with. Hopefully with more experience I’ll be able to contribute more effectively. Happy hacking my fellow Pythoneers!

16:54

The main Python site at www.python.org was redesigned in 2005-2006 -- over six years ago. It's time for a redesign, to improve the organization of the site and its appearance, and to simplify the task for the volunteers who maintain the content. The PSF would therefore like fund the design and implementation of a new look and architecture for python.org. The web development landscape has also changed a lot since 2006, and we look forward to seeing what this community can produce.

The Request for Proposal for the python.org redesign has been published on readthedocs.org. Questions and comments can be e-mailed to the psf-redesign mailing list at psf-redesign at python.org. Proposals are due by July 21st 2012, two months from today.

The RFP was initially drafted by Jesse Noller, and feedback from the python.org site maintainers was incorporated by Andrew Kuchling.

Just because I have to use a callback-oriented style on the client doesn't mean I want to use a callback-oriented style on the server. Now, before anyone gets all upset and tells me that I don't know the difference between async and a kitchen sink, let me explain :)

The client is necessarily an event-oriented place. If I don't know which button the user is going to press, it makes a lot of sense to use a different callback for each button. The server is different. If I'm waiting for the result of a database query before I can continue processing a request, it sure is convenient to just block and wait.

My key point is that it's important to separate what style you want to code with and what performance and scalability characteristics you want. You shouldn't necessarily pick a callback-oriented style just because you want the performance and scalability characteristics of asynchronous networking APIs.

My favorite two examples are gevent and Erlang, but Go is similar. When you code using gevent or Erlang, your code looks like synchronous, blocking code. However, below the covers, they use asynchronous networking APIs. Now, before anyone tells me that it's impossible, buggy, or that it'll never work, let me point out that these tricks have been in production for decades at Ericsson, Yahoo Groups, and IronPort Cisco.

Furthermore, I should point out that asynchronous networking APIs aren't a perfect fit for every problem. For instance, if your goal is to send 10 gigabytes of information to another server, it turns out that synchronous networking APIs will actually outperform asynchronous networking APIs. The reason asynchronous networking APIs are so popular is because they can handle a larger number of clients than synchronous networking APIs can and because they use less memory than a large number of threads, which each have to have their own stack. gevent and Erlang can handle a large number of clients, don't use up much memory, and don't require a real OS-level stack per client.

So what's my problem with the callback-oriented style? I find it a lot harder to read. I've coded projects in Twisted, Node.js, etc., and I prefer the gevent approach. You get roughly the same performance and scalability characteristics, but with much easier to read code. Of course, what's readable to me may not be readable to other people. I've met people who are perfectly happy using Twisted Web 1 and don't think that callback-oriented code poses any real challenge.

If you're interested in hearing more about my thoughts on async and concurrency, check out my other blog posts, which include a link to my Dr. Dobb's Journal article on Python concurrency.

13:45

Someone needs to kick this off… Plone 5!

The whole point of this is focusing/refocusing our development efforts, so I’m going to keep this short and expect the rest of you to run with the discussion.

Plone 5 will drop in March, 2013 and will be about two things:
1) Dexterity becomes Plone’s default content type story. We handle multilingual content beautifully. We figure out a migration path for Archetypes content.
2) Diazo becomes Plone’s default theming story. CMSUI/plone.app.toolbar gets awesome and wraps the whole thing up nicely.

Those two features alone are more than enough to make for an awesome major release, with a reasonable upgrade from Plone 4. Everything we accomplish past that is icing on the cake.

Go.

Edit: Added some links to the relevant technologies. Developer discussion happens here.

08:18

There is a fair amount of chatter in Microsoft forums regarding problems cause by recent Microsoft p ...(more)...

07:54

Well, it’s official — a labor of love from myself and many others — with special thanks to Andrew Kuchling for getting it over the finish line. The Python Software Foundation has officially announced a call for proposals for the redesign of the Python.org site and properties.

You can see the RFP here: http://pythonorg-redesign.readthedocs.org/en/latest/

It’s taken me several years of false starts, other attempts (including skunkworks attempts), political and social discussions, and the hard work of many to make this come to fruition. Now, we can only sit back and hope that we see some amazing proposals from the community and others.

I sincerely hope this will be successful, and that we will see a modern, well designed Python.org that showcases not only the language, but the vibrant, open, welcoming and active community we are all part of. 

flattr this!

06:54

Earlier this month, I had the pleasure of exhibiting at the 17th Annual Healthcare Marketing Strategies Summit in Orlando, Florida. It was an interesting year to be there, as many of the healthcare marketers in attendance were genuinely worried. “Now what do we do?”

Regardless of the pending Supreme Court decision on the constitutionality of the ObamaCare mandate, the healthcare industry is in a serious state of transition. Those who spend their days consumed with how to best market their hospitals, physician practices, clinics, and other healthcare organizations, are feeling nervous. Budgets are being cut, but marketers are expected to accomplish more. Meanwhile, tried-and-true methods of marketing aren’t working as well.

As one speaker noted, “It's amazing how small the budgets are for web but how high the expectations are for web.”

My own take is that this industry is being awakened out of a long complacency and that there is no better time to be a healthcare marketer than right now.

Here are my top five takeaways from the conference.

1. Healthcare marketing is behind the curve

The industry, as a whole, is a few years behind less risk-averse industries when it comes to social media, mobile, search, and other online marketing technologies. The reasons why are probably obvious. It can be difficult to innovate in an industry that is so highly regulated, where the stakes are so high, and where mistakes can bring litigation.

Even so, you’d expect that, at a modern convention attended by hundreds of “marketing experts,” at least this group would be a bit more, well, “present” online.

They were not. Only about a dozen or so people were actively tweeting out of several hundred attendees, all of whom in marketing or technology.

2. Healthcare marketing is at a tipping point

Notwithstanding all of the above, it felt like a really exciting time to be in healthcare marketing. The industry is starting to jump into the fun part of the innovation hockey stick, and I think healthcare will be transformed as a result. Some really wonderful things are happening now, especially in mobile.

According to David Friedersdorf of iTriage and Joseph Cazayoux of HCA Healthcare, 20% of healthcare consumers now use mobile phones to research healthcare purchases. Chris Catallo and Michael Schneider from Greystone.net informed us that nearly 90% of all Americans now have a mobile device.

Social media marketing is also now starting to take hold in healthcare. Chris and Michael also let us know that a third of people over 65 years old now are active on at least one social media platform, and this is only increasing.

Healthcare marketers are fully aware of these trends and are starting to meet the demand.

3. Healthcare marketing is getting more strategic

Here are some of the best tweets on this topic posted during the conference:

digitalmediafluency.png

"digital media fluency is no longer optional." Web, social and mobile technologies require new skills

thebrandexperiencestarts.png

The brand experience starts long before you take my temperature

marketersneedtobe.png

Marketers need to be at the nexus of analytics and creativity

experimentationisthenew.png

Experimentation is the new market research

 

4. Content marketing is surging

I had numerous conversations with marketers at hospitals and other healthcare organizations who told me that content has become the keystone of their marketing efforts and that their biggest issue is finding the time and resources to create and manage content. They also expressed concerns about losing control of their own content.

As Chris and Michael at Greystone.net stated, “More people see your content off your site than on your own website.”

Of course, real content marketing requires a modern and highly flexible content management system like Plone.

5. Intra-organizational collaboration has a ways to go

Many of these organizations attempt to use SharePoint to provide a collaboration and document sharing platform, but a lot of the people I spoke with expressed dissatisfaction. It wasn’t hard to get them to sign up for Six Feet Up’s upcoming webinar, “Alternatives to SharePoint.” (Hint: If this is an area of interest you can sign up too.)

In particular, there was a ton of interest in KARL, an open source collaboration and knowledge management platform widely used for intranets in a number of industries.

Conclusion

As one keynote speaker pointed out, even if healthcare spending slows, it will still be one of the fastest growing sectors of the economy for the foreseeable future. Still, like any transition, it can be unnerving. The old methods don’t work as well as they did and the new methods are, well, new.

Still, I maintain there has never been a more exciting and dynamic time to be a healthcare marketer, and offer that if you’re feeling unsure about how to create content, how to manage and publish content, how to take advantage of social media, search, or mobile technologies, or how to better collaborate internally, please reach out to us and mention you read this post.

06:09

Using overlapping IP fragmentation to avoid detection by an IDS has been around for a long time.&nbs ...(more)...

05:15

I am starting to enjoy working with Web2py. I was recently provided with a review copy of a recent Web2Py book, and I have started reading through it. I will have a full review of this book in the next couple weeks. When I first went on to learn Web2Py, I used the free online book, and that for the most part didn't change me over. The free online, like Django's online book, is just that. A free online book, and didn't provide me with enough knowledge and confidence to develop using the framework. I got into Django using it's very extensive documentation website.

The book I am reviewing has many examples which can easily be applied to real-world web applications, which I praise. As I have been reading through the book, I have also been playing around and exploring Web2Py in general. It has some very interesting design choices and features. It is definitely good for someone who can never remember what to import, as there is rarely any need to import anything in web2py, besides when you need to import a 3rd party Python module. At first, this design choice caught me off guard, as in Python, I am used to explicitly importing my modules for use. Web2Py includes it's technical reference with the application itself, in the form of Epydoc. I am not terribly fond of this documentation format, but it is how I learned Pyjamas, as it uses the same documentation system.

Web2Py has some interesting magic, which frankly does make web development a breeze. Generating forms and cruds is literally a one-liner, further customization can be made by using more lines of code. Web2Py also comes with a nice default template, and the application wizard, has many additional templates for use as well. This feature is definitely well welcomed in my book. Sure, it implies that tons of websites made with Web2Py will have the same look and feel, but is that such a bad thing? It just means that visitors will know where most resources can be found, and how to use the included form widgets. Think of Wordpress, most Wordpress sites I visit keep the included wordpress theme. Blogspot is another example of a set of websites which share a common theme. The included themes allow one to get to developing the website as quickly as possible without having to worry about how it will look and feel, but rather focus on the functionality. I am more of a functionality person, not a look-and-feel type of person.

All in all, I am very excited to be trying out Web2Py for a second time, and do hope that when I complete this book, I can both provide a review of the book itself, as well as the framework in general. Neither of these reviews will be compared to anything, and be strictly of the book and framework itself. I am thinking of making a future website using Web2Py to see where it goes. I will most likely keep my blog running on Django, as I have used many Django-specific features which would be otherwise troublesome to migrate over. These articles contain Django template code, for example.

02:15

Yesterday I showed how to implement a simple email form for Django using Class Based Views. Today I'm going to extend yesterday's work to use the excellent RECAPTCHA service to help reduce spam content.

This version requires pip installing the following into your virtualenv.

  • pip install django-crispy-forms so we can do Python driven layouts.
  • pip install django-floppyforms so we get HTML5 elements for free.
  • pip install django-recaptcha to do the RECAPTCHA work.

Don't forget to add the app to your INSTALLED_APPS in settings.py:

INSTALLED_APPS += (
    'crispy_forms',
    'floppyforms',
    'captcha',
)

Generate your KEYs from the RECAPTCHA site and add them in settings.py:

RECAPTCHA_PUBLIC_KEY = '6LcVu9ESAAAAANVWwbM5-PLuLES94GQ2bIYmSNTG'
RECAPTCHA_PRIVATE_KEY = '6LcVu9ESAAAAAGxz7aEIACWRa3CVnXN3mFd-cajP'

In myapp.forms.py:

from captcha.fields import ReCaptchaField  # Only import different from yesterday
from crispy_forms.helper import FormHelper
from crispy_forms.layout import Submit
import floppyforms as forms

class ContactForm(forms.Form):

    name = forms.CharField(required=True)
    email = forms.EmailField(required=True)
    subject = forms.CharField(required=True)
    message = forms.CharField(widget=forms.Textarea)
    captcha = ReCaptchaField()  # Only field different from yesterday

    def __init__(self, *args, **kwargs):
        self.helper = FormHelper()
        self.helper.add_input(Submit('submit', 'Submit'))
        super(ContactForm, self).__init__(*args, **kwargs)

In myapp.views.py:

# Unchanged from yesterday. :-)
from django.conf import settings
from django.core.mail import send_mail
from django.views.generic import FormView

from myapp.forms import ContactForm

class ContactFormView(FormView):

    form_class = ContactForm
    template_name = "myapp/email_form.html"
    success_url = '/email-sent/'

    def form_valid(self, form):
        message = "{name} / {email} said: ".format(
            name=form.cleaned_data.get('name'),
            email=form.cleaned_data.get('email'))
        message += "\n\n{0}".format(form.cleaned_data.get('message'))
        send_mail(
            subject=form.cleaned_data.get('subject'),
            message=message,
            from_email='contact-form@myapp.com',
            recipient_list=[settings.LIST_OF_EMAIL_RECIPIENTS],
        )
        return super(ContactFormView, self).form_valid(form)

In templates/myapp/email_form.html:

{# Also unchanged from yesterday. :-)  #}
{% extends 'base.html' %}
{% load crispy_forms_tags %}

{% block title %}Send an email{% endblock %}

{% block content %}
    <div class="row">
        <div class="span6">
            <h1>Send an email</h1>
            {% crispy form form.helper %}
        </div>
    </div>
{% endblock %}

{% block extrajs %}
<script src="{{ STATIC_URL }}js/jquery-1.7.1.min.js"></script>
<script type="text/javascript">
$(function() {
    $('#id_name').focus()
});
</script>
{% endblock %}

What I did

  • Using pip I installed three packages into my Python environment.
  • Added those three packages into the INSTALLED_APPS setting.
  • Set the RECAPTCHA keys for my site.
  • Modified the forms.py file from yesterday to include the RECAPTCHA field.
  • Reduced spam content.

What I could do

  • Pin the app versions for a particular release. This is what you should be doing in normal development and in production, but for a blog entry I'm avoiding it because release numbers become quickly dated.
  • Rather than change the ContactForm from yesterday, I could have extended it via inheritance.

Want to learn more?

If you live in the Los Angeles area and want to learn more about Django, everything from the basics to setting up a Content Management System or E-Commerce system, check out our Django (and Python) training at Cartwheel Academy.

00:45

Plone is an awesome community and wonderfully versatile platform. The Plone Innovation Awards will showcase the most innovative new features and developments in a very visual way, to both the developer community and a world wide audience.

This serves the following goals:

  • emphasizing the ongoing development and innovation in Plone
  • highlighting the most interesting innovations for the community
  • praising and celebrating the individuals moving Plone forward
  • demonstrating the viability and reach of Plone for a wide audience
  • generating buzz and traffic around Plone

Poster session

At the Plone Conference, an exhibit will show poster-format prints of the top ranking award entries. The posters will provide the visual presentation and textsummary of the innovation. A QR code on the posters facilitates last-minute voting the run-up to the award ceremony.

Award ceremony

In a plenary session at the Plone Conference 2012, the top rated entries will be shown on screen and shortly narrated. A panel staffed by respected community members will select the winners from the top-voted entries in several categories: core, add-ons, design, business impact, ....

 

More info on where and how to submit your entry very shortly!

Tuesday, 22 May

23:45

I have never been asked to talk at a commencement event. I’m not surprised — I’m not nearly famous enough. However, I’m pretty sure that after reading this blog post, nobody will make the mistake of inviting me. Here, I’ll be talking about a career as a software engineer, since this is the only career I can speak confidently about.

Anyone who goes to college, especially a good college, for a CS degree is making a huge mistake. Well, almost anyone. If you want to be an academic researcher in CS, this might be the right program for you. However, if you want a career as a software engineer, that was not the optimal to take. Sure, a degree is fine — but you can get one just as good from a minor college, and do a lot better for yourself.

But first, let me digress a bit from what you should have done better, and tell you what your college should have done better. Instead of teaching you for 9 months out of the year, and having you intern for three, it should have been the reverse. When you started, they should have given you a three months crash course about programming — C, C++, Python and Java would have been my choices. In three months, 40 hours per week, you can learn the basics of programming in each of those. For the end, you would write a basic web server which calls out to C for low-level string crunching. Then, your college would help you intern for some “code monkey” position — writing code for a large hairy web application written in a mixture of five languages. I’m sure many companies would love to hire a minimum wage employee to fix those bugs. Those would not be the most pleasant of nine months, but at the end of those you will have learned what bug tracking systems are, what source control is, how to work with co-workers and other important life skills.

For the next three months, you would have studied networking and operating system. You would finish with writing a device driver and a TCP stack. With these skills, and one internship already under your belt, you would be again farmed out for internship at one of the big-but-interesting companies — Cisco, Oracle and the like. There you would work as a temporary employee on one of the products that they are bringing to market. You would learn how to delve into code bases, how to ask the right questions and how to take initiative.

The next-to-last three months would be devoted to theory — finite automata and Turing machines, statistics, calculus and linear algebra. The nine months after that you would, ideally, find an internship in something that interests you.

The senior year (well, three months) you would spend on electives — technical writing, public speaking, advanced math, security, genetics or multiple other topics would be provided for you to specialize in. With a year’s worth of studies and more than two years’ worth of work experience, you will be snatched up by many employers.

However, your college has been optimized to train academic researchers, not to provide the software industry with ready valuable employees. So what can one person, without ability to change the system, do?

Spend the first two years getting a CS associate degree at a no-name college. Do the minimum work to get a half-reasonable GPA, and in the rest of the time, choose a well-known open source projects, and start contributing to it. Most reasonable projects have public bug lists — just start fixing bugs. It will be hard, at first, but keep at it. Two years of this should be enough for you to be a well-recognized contributor.

After that, get into a CS program at any local college that will accept you. Continue contributing on your open source project of choice, but now start answering questions related to it on Stack Overflow, and also start your own projects. What projects? Think for five minutes (on the clock!) about 20 new ways to improve your life using software, and choose the 3 most appealing. Publish them with a reasonable license (say, BSD Lite) on github. Make sure to try for interesting internships every summer.


23:27

Aaron Sorkin has been tapped to write the TV movie about the aging prince's eventual election to Pat Toomey's Senate seat, currently titled either 'FRESHman Senator' or 'Mr. Smith Goes to Washington'.

18:36



Fedora User Developer Conference (FUDCon) Kuala Lumpur 2012 - which I was the event owner - ended successfully on Sunday, 20th May 2012. With only around 3 months of planning (1 month for the bid, and 8 weeks for the organizing process), FUDCon Kuala Lumpur might have been one of the shortest FUDCon planning ever, if not the shortest!!.

With roughly a total of 34 crew and volunteers, we managed to pull it off successfully. To some of the crews, our experience pulling off a crazy stunt of organizing FOSS.My 2008 in 30 days might have helped us in getting prepared for the worst.

We were hit by the Murphy's Law several times before and during the conference, and one of it was Harish unfortunate tennis calf injury made him unable to come to FUDCon KL. This created a challenge for us in the monetary stuff related to on-site payments, but thanks to Harish's swift action sending us some cash through Western Union , we managed to cover a number of them.

Pre-conference

With only 8 weeks to the event after winning the FUDCon APAC bid, the team scrambled to start shooting down tasks for the event. As usual for Fedora Malaysia events, we simply pick UCTI as our venue, primarily because there is Gurdip who will almost always get us what we need from the college.

We also hacked together a conference management system on Plone in less than a day worth of effective hours for the event site. Code is available here: https://github.com/inigoconsulting/collective.conference. The system served its purpose ok, though during the conference we discovered that certain things can still be done better, namely in session organizing and listing. For example, the agenda should display who is the speaker of a session and the listing of attendees need to be filterable easily.

The funding request approval was one of the tough stages during the organizing process. We wanted to bring as many contributors as possible to FUDCon Kuala Lumpur, however, we also need to keep it within our budget limit. Deciding on the seection fairly was tough and for  those who did not manage to get the funding, please no hard feelings, we hope you do try again.

We also created several artworks for swags and banners for the FUDCon. We produced 170 T-shirts, 1000 Fedora 1.5" buttons (500 for the event, 500 for distributing across APAC), 10 banners (2 roll up generic fedora/fedora-my, 6 x-stand generic, 1 x-stand FUDCon-specific, 1 big gate banner), and 350 FUDCon stickers. We chose to focus on creating generic swags so that we can reuse in the future, especially the banners.

Nearing the event, OSCC-MAMPU contacted us for a meeting, and from the meeting, they offered 2 Samsung Galaxy Tabs for lucky draw , in exchange for us getting attendees to fill in a survey form regarding to FOSS. We however, could not announce this early as we only get the final confirmation on the tablets on Saturday, 19th May.

Murphy's law does not leave us alone however. A few days before the event, around the final 2 weeks, we received some sad news from Heherson, David Ramsey and Harish that they could not make it to the event. The news about the absence of Harish created a challenge for us related to monetary stuff as we need petty cash for a number of stuff during preparation, and the FUDCon days. As a temporary measure, I put aside my own money, and cut off some items from our things-to-buy list and try to get some items - like lunch and tea - to be paid later. Harish also sent some cash to us on Saturday. Fortunately, it kindof works out. 


Day 0 : Thursday, May 17 2012

Ankur (ankursinha), Praveen (kumarpraveen), Aditya (adimania), Kushal (kushal) and Soumya (soumyac) arrived early on this morning and I was in charge of picking them up. Ankur, Praveen and Aditya arrived roughly on time on 12:30am, however, Kushal and Soumya's arrival were delayed for almost 3 hours, and we only manage to leave the airport around 3:30am.

We arrived at the hotel around 5am and after check-in, I left the hotel to home to switch the van which I was driving, to my car, and return back to the hotel. Managed to get a short nap. Afterwards, I took the group to the LRT station for a short tour around KL. We dropped by Low Yat  plaza and grabbed some simcards for the group. After briefing the group how to take the train lines and places which they might want to visit, at noon, I left them on their own and head to UCTI for the preparation for FUDCon KL.

When I arrived at UCTI, I was surprised to notice that Tuan (tuanta) was already there. After giving instructions to the volunteers on the tasks that need to be done, I drove tuanta to the hotel to check-in his room. At the hotel lobby, we met with Christoph (cwickert) who apparently already arrived and was hanging out at the lobby. After tuanta left his belongings in his room, we head back to the venue, this time, with cwickert.

Preparation ended at around 10:00pm as the venue was closing. There were however, some more tasks left , which some of the team then later continued them at the crew hotel room.

Mahay (mak), Buddhike (bckurera), Danishka (snavin), Kalpa (callkalpa) and Uditha (udinnet) arrived some time on this day too, and they checked into their rooms themselves.

Day 1: Friday, May 17 2012

Caius (kaio) and Ratnadeep (rtnpro) arrived early this morning around 1am, and they were picked up by Meng (seatux86) from the airport.

The crew started our day very early at around 6:00am and head to the venue at 7:00am. Registration desk finished setup at around 8:20am but we already had a few people coming at around 8:00am. At around 9:00am, the rate of people coming started increasing.

The opening keynote started on time on 10:00am. Harish was supposed to give the opening keynote, however, as he could not make it, we had cwickert to do it. Cwickert gave a talk on "Leadership in Leaderless Organization" which, in my opinion, fits very well to what we will be having right after the keynote, a Fedora BarCamp!.

After the keynote, I gave the attendees a short briefing on what is a BarCamp and how the BarCamp voting process will flow. We then had quite a number (15++?) of talks submitted, that we shortened our 1 hour slots to 30 minutes.  Plenty of the topics were really interesting, which made me somewhat regretted setting Day 2 and Day 3 to fixed schedule. I never been in a BarCamp which had more than 20 geeks who are able to give talks before, and was not expecting such interesting outcomes. If we had made all 3 days a BarCamp, we might be able to go into deeper stuff, depending on what the audience interested in.

After the day ended we head back to the hotel. I then went out to a Western Union agent to cash out some money Harish sent to us for petty cash for the event. At around 10pm, the APAC  Ambassadors met up in the crew hotel room to discuss future activities for APAC. We had a very interesting discussion and sharing session which lasted until 1am. If I remember correctly Yogi (jurank_dankkal) was typing down the notes from the meeting, not exactly sure whether it has been uploaded somewhere or not.

Day 2 : Saturday, May 17, 2012

Like yesterday, the crew started our day at around 6am and head to the venue on 7am. Registration desk open at 8:30am, and at around 9am we had a bus full of students from the German Malaysian Institute registering and attending the event.

The day started without any incident with Joshua Wulf (jwulf) giving the opening keynote for Day 2 where he introduced his project - a crowdsourced Fedora book. The day then continues according to schedule.

After lunch we had a little hiccup as there were some confusion about some sessions that was actually meant to be a discussion session, but was submitted as a talk. We simply cancel out the session and proceed the day as scheduled.

After the final session, we then did a lightning talk session. Afterwards, there were a lucky draw for a Red Hat keyboard which was sent by Harish through Alan Ho earlier in the day before. Swee Meng (sweester), one of our local FOSS community geek, won the keyboard.

Afterwards, we then head to the hotel. FUDPub / FUDCon Dinner then starts at 8pm in a function room at the hotel. Cwickert brought his Beefy Miracle costume and surprised us in the middle of the FUDPub!. After everyone finished enjoying the food there, we took out the cakes for an early  celebration for Fedora 17 and also  for FUDCon KL. When cwickert was cutting the cake, I noticed kushal whispering something to cwickert. I was then caked by kushal!!. I should have remembered what he did during FUDCon Pune to Rahul (rahulsundaram) !!. That inevitably initiated a cake war where everybody cake each other!!.

Dinner ends at around 10pm, and we all head back to our rooms and houses, full with food, and exhausted running around from being caked.

Day 3 : Sunday, May 18, 2012

Final day of FUDCon Kuala Lumpur. Unlike the days earlier, the crew started the day late today, at around 7:00am and only head to the venue at 8:00am. Attendee count on this day was also lower than the attendee count on the earlier days.

The day started slow, where people only started to come in at around 10am. Due to the low attendee count, and a very slow day, at lunchtime, we decided to scrap whatever that have been planned, and run a BarCamp instead. All sessions which were submitted was put up for voting again, shortened, and we reorganized the schedule. This turned to be a good decision as it injected back some life into the event.

At the end of the day, Abu Mansur, a well respected local FOSS community person, which also an employee of Red Hat Malaysia, gave the closing keynote.  Right after the keynote, I took some time to thank everyone who have been contributing to the event and have made FUDCon Kuala Lumpur a success. Following that, we then draw 2 more lucky draw for 2 Galaxy Tabs which was sponsored by OSCC-MAMPU and the day ended with a final group photo in front of UCTI.

Post conference

Some of the Fedora contributors left early (jwulf on Saturday afternoon, kaio on Sunday morning). While the rest left after the event with cwickert is one of the earliest which is right after the group photo session, followed by Mahay around 1.5 hours later.

Those who yet to leave, met up for a dinner at a nearby Old Town kopitiam. When we were having our dinner, we was contacted with an emergency call from Mahay that he had took the wrong train and is in danger of missing his flight. Fortunately, he managed to reach the airport barely in time before the boarding closes.

After dinner, I asked Fedora Ambassadors who still around to come to the crew room to pick up one generic Fedora x-stand banner and some swags for their place and future events.

At around 2am, kushal, bckurera, snavin, udinnet, soumya, tuanta and callkalpa then checks out and left the hotel on a van to the airport as their flights are on 6am~7am.

rtnpro was the only one left out of the sponsored attendee list as his flight was on Tuesday. All rooms were checked out on Monday morning, so rtnpro had to stay at Yee Myat (MavJS)'s place for a night.

After checking out all of the rooms on Monday, I then went around to drop swags at OSCC-MAMPU thanking them for the sponsorship of the Galaxy Tabs, and also met up with Siva of Red Hat Singapore which Harish sent over to settle some payment issues we had with the hotel. Passed some swags to him and a t-shirt for Harish.

There are still some post-conference work left for me for FUDCon KL, primarily sorting out the receipts, some final invoices which still need to be paid, and claim back the money me and Eric used for FUDCon due to Harish could not attend. Hopefully all these will be done by end of this week or early next week.

Remarks/Notes

Things we learned, discovered, and some advises/suggestions for future FUDCon - or even for other events.
  • A free-to-participate event is much much much easier to run and less stressful than a paid event. Primarily due to the lack of overhead related to registration and less obligation related to swags and food - removing a lot of headaches and stress we once had during FOSS.My 2008/2009.
  • In an event where there are plenty of attendees who are willing to talk or run sessions, a BarCamp format rocks. I kindof regretted not running the whole conference using a BarCamp format considering we had almost 20 Fedora contributors attended. BarCamp format also helps in adding more session which might be of more interest to the audience, and dropping sessions which are of less interest.
  • Keynotes helps in getting everyone in the same room. Try to have keynotes on each start and end of each day. This is useful for announcements, or for ensuring everyone are in the same room before BarCamp pitches+voting start, or for the closing of the event.
  • Walkie talkies are essential for large events. Ensure several handy for the core crew members.
  • Generic swags rocks - anything extra, use it for future local events
  • FUDCon is one of the few times where we are able to gather a lot of contributors at the same place. Make use of it fully for discussions, meeting, hacking and for distributing swags across the region.
  • Sponsoring attendees who can contribute to the event is a GoodIdea(TM). 
  • Close the registration before printing the registrant list. So that there are no people asking - "Oh I have registered, but my name is not on it"
  • Tags and food coupons should be printed earlier - we printed them late, almost screwed ourselves up.
  • Give food coupons to volunteers before the event, not during the event. Giving them during event can cause confusion.
  • Have a crew room in the hotel, regardless whether the crews are staying nearby or not. The crew room can be used for meetings and preparation stuff.
  • Always have a whiteboard for the schedule. Printed schedules could not be reorganized. However, people still need a place to refer to the updated schedule. Online schedule does not really work as they did not have time to load up the site during the event.
  • Do not rearrange sessions in the schedule after they have been assigned, and the event have started. Cancel the session and move it to later on the day or to the next day. Do not shift up sessions as that causes confusion.
  • Be flexible with the event. If you think something can be done better through changing some stuff, change it.
  • Getting the cafe to be open is easier than to provide food for all attendees.
  • As usual, 50% no-show rule applies for free event in Malaysia. We had almost 500 registrants on the website, but only about 260-80 attended throughout the 3 days.
  • Coffee - meh .. Red Bull - win .. Managing coffee and water heater is difficult. Just provide cans of Red Bulls. 
  • If there are cakes nearby, and Kushal Das is in the event, be EXTREMELY careful.
Thanks  to everyone who have contributed to FUDCon Kuala Lumpur 2012 and made this event a success. See you guys later in other events. Kuala Lumpur wont be bidding for at least the next FUDCon APAC, so whoever going to bid for the next FUDCon APAC, we wish you all the best and hope to see you there. 

16:54

As I mentioned in my last article, I figured I’d try to find something that I could patch in Python and submit it. While writing the other article, I stumbled on a minor error in the Python devguide in the Windows section. While it’s nowhere near as cool to patch a piece of documentation as I think it would be to patch Python, I think it’s rather appropriate for me as I tend to contribute more documentation than anything else lately. So I am going to explain the process as I found it.

Getting Started

First off, you need to get an account with the Bug Tracker for Python. If you hope to become a core developer, then you’ll need to make sure your username follows their guidelines, which are really simple:


firstname.lastname

Once you’ve got that, you can start looking for something to patch. There’s a link that says “Easy issues” that is a good starting place. You can also do a search for a component that you’re competent in using and see if there are any bugs in there that you think you can fix. Once you find something, you’ll need to make sure you update your local repo and then read the devguide’s patch page.

Creating the Patch

Assuming you have the necessary repository checked out on your local machine, all you need to do is go edit the appropriate file. In my case, I had to check out the devguide (which you can read about here) and edit the setup.rst file. If you’re editing Python code, then you’ll have to conform to PEP8. Once I finished editing the file, I saved my changes and then had to use Mercurial to create the patch. Here’s the command I used per the Python patch instructions.


hg diff > setup.patch

And here is the contents of that patch file:


diff -r b1c1d15271c0 setup.rst
--- a/setup.rst Tue May 22 00:33:42 2012 +0200
+++ b/setup.rst Tue May 22 13:55:09 2012 -0500
@@ -173,7 +173,7 @@
To build from the Visual Studio GUI, open pcbuild.sln to load the project
files and choose the Build Solution option from the Build menu, often
associated with the F7 key. Make sure you have chosen the "Debug" option from
-the build configuration drop-down first.
+the configuration toolbar drop-down first.

Once built you might want to set Python as a startup project. Pressing F5 in
Visual Studio, or choosing Start Debugging from the Debug menu, will launch

Now that we have a patch we need to submit it!

Submitting a Patch

Put your shields up, we’re going in! Submitting a patch is a little daunting. What will people think of you? I suspect if you plan to work on something major, then you better start growing some thick skin. In my case, I’m going to submit a really simple typo fix, so I’m hoping that sort of thing isn’t flame-worthy. Then again, this is my first patch, so I may submit it in a completely erroneous way. Since my patch will be for something (presumably) new, I did a quick search to make sure it hadn’t already been reported. Seeing nothing, I clicked the “Create New” link with some trepidation and choose the “devguide” as my component. I also chose the latest version of Python. I don’t see anything in the devguide that says it applies to just one set of Python versions, so I’m just going to leave it at that. I didn’t really see a “type” that fit a devguide edit, so I left that blank for my betters to fix. Finally, I attached my patch file to the bug ticket. You can see my bug ticket here if you like.

When contributing a patch to Python, you should fill out a contributor agreement form which allows the Python Software Foundation to license your code for use with Python while you get to keep the copyright. Yes, you too can become famous just for writing Python code! Assuming people read the source or those acknowledgement pages.

Wrapping Up

I don’t know what will happen to my rather lame contribution. Maybe it’ll get accepted, maybe not. But I think I’ll spend some time trying to figure out some other bugs and just see if there’s anything I can do to help out the Python community. Feel free to join me on this adventure!

Behold:

OSX’s iTerm 2, and maybe some other terminal applications, support ANSI control sequence extensions which allow shell to set the color of the terminal tab.

Below is a Python script which

  • Randomizes a color based on the server host name. The same hostname always results to the same color.
  • The color is randomized in HSL color space, so that only the hue component varies and saturation and lightness are locked. This prevents the creation of ugly color combinations like black text on black tab background.

Note: The effect can be also applied on terminal windows  – for those who don’t use tabs.

The effective result is that

  • You learn to identify terminal tabs by the color
  • You can much more faster to switch between tabs, because you can visually pick up the terminal without needing to be able to read the text on it or remember its location in the list

Note: If your puny terminal does not support setting the color of window decorations, you can always set the terminal background color. This is useful e.g. if you want to red background for danger zone ™ when you are logged in as root on the production server 23:00 Friday night.

Note: Naturally you also need to have the script installed on the servers you are ssh’ing into

precmd() hook

You can run the script once and the tab color is set. However, if you SSH from the computer to another  and then exit back, the color of the latest server would remain in this case.

This can be avoided by

  1. Calculating the OSC control code sequence needed to set the terminal tab color when the shell starts
  2. Have a precmd() hook (zsh terminology, not sure what other shells use) to reset the tab color every time the shell prompt is displayd

We, me with my friend, are maintaining (yet another) zsh toolkit called ztanesh (github). There  you can find precmd() example codes in 1) 98-server-color and 2) 80-statusbar.

rainbow-parade.py

The script code lives on Github. Currently it supports iTerm 2 only and we plan to expand support to Konsole. Patches for other terminals are welcome.

(This probably could be done in pure shell code too, but Python is just so much more fun…)

#!/usr/bin/env python
"""

       Set terminal tab / decoration color by the server name.

       Get a random colour which matches the server name and use it for the tab colour:
       the benefit is that each server gets a distinct color which you do not need
       to configure beforehand.

"""

import socket
import random
import colorsys
import sys

# http://stackoverflow.com/questions/1523427/python-what-is-the-common-header-format
__copyright__ = "Copyright 2012 Mikko Ohtamaa - http://opensourcehacker.com"
__author__ = "Mikko Ohtamaa <mikko@opensourcehacker.com>"
__licence__ = "WTFPL"
__credits__ = ["Antti Haapala"]

USAGE = """
Colorize terminal tab based on the current host name.

Usage: rainbow-parade.py [0-1.0] [0-1.0] # Lightness and saturation values

An iTerm 2 example (recolorize dark grey background and black text):

    rainbow-parade.py 0.7 0.4
"""

def get_random_by_string(s):
    """
    Get always the same 0...1 random number based on an arbitrary string
    """

    # Initialize random gen by server name hash
    random.seed(s)
    return random.random()

def decorate_terminal(color):
    """
    Set terminal tab / decoration color.

    Please note that iTerm 2 / Konsole have different control codes over this.
    Note sure what other terminals support this behavior.

    :param color: tuple of (r, g, b)
    """

    r, g, b = color

    # iTerm 2
    # http://www.iterm2.com/#/section/documentation/escape_codes"
    sys.stdout.write("\033]6;1;bg;red;brightness;%d\a" % int(r * 255))
    sys.stdout.write("\033]6;1;bg;green;brightness;%d\a" % int(g * 255))
    sys.stdout.write("\033]6;1;bg;blue;brightness;%d\a" % int(b * 255))
    sys.stdout.flush()

    # Konsole
    # TODO
    # http://meta.ath0.com/2006/05/24/unix-shell-games-with-kde/

def rainbow_unicorn(lightness, saturation):
    """
    Colorize terminal tab by your server name.

    Create a color in HSL space where lightness and saturation is locked, tune only hue by the server.

    http://games.adultswim.com/robot-unicorn-attack-twitchy-online-game.html
    """

    name = socket.gethostname()

    hue = get_random_by_string(name)

    color = colorsys.hls_to_rgb(hue, lightness, saturation)

    decorate_terminal(color)

def main():
    """
    From Toholampi with love http://www.toholampi.fi/tiedostot/119_yleisesite_englanti_naytto.pdf
    """
    if(len(sys.argv) < 3):
        sys.exit(USAGE)

    lightness = float(sys.argv[1])
    saturation = float(sys.argv[2])

    rainbow_unicorn(lightness, saturation)

if __name__ == "__main__":
    main()

 Subscribe to this blog in a reader Follow me on Twitter

16:09

ebooks are a new frontier, but they look a lot like the old web frontier, with HTML, CSS, and XML underpinning the main ebook standard, ePub. Yet there are key distinctions between ebook publishing’s current problems and what the web standards movement faced. The web was founded without an intent to disrupt any particular industry; it had no precedent, no analogy. E-reading antagonizes a large, powerful industry that’s scared of what this new way of reading brings—and they’re either actively fighting open standards or simply ignoring them. In part one of a two-part series in this issue, Nick Disabato examines the explosion in reading, explores how content is freeing itself from context, and mines the broken ebook landscape in search of business logic and a way out of the present mess.

The internet is disrupting many content-focused industries, and the publishing landscape is beginning its own transformation in response. Tools haven’t yet been developed to properly, semantically export long-form writing. Most books are encumbered by Digital Rights Management (DRM), a piracy-encouraging practice long since abandoned by the music industry. In the second article of a two-part series in this issue, Nick Disabato discusses the ramifications of these practices for various publishers and proposes a way forward, so we can all continue sharing information openly, in a way that benefits publishers, writers, and readers alike.

ENJOY A LIST APART’S SPECIAL two-part issue on digital publication standards.

Publication Standards Part 1:
The Fragmented Present

by NICK DISABATO

ebooks are a new frontier, but they look a lot like the old web frontier, with HTML, CSS, and XML underpinning the main ebook standard, ePub. Yet there are key distinctions between ebook publishing’s current problems and what the web standards movement faced. The web was founded without an intent to disrupt any particular industry; it had no precedent, no analogy. E-reading antagonizes a large, powerful industry that’s scared of what this new way of reading brings—and they’re either actively fighting open standards or simply ignoring them. In part one of a two-part series in this issue, Nick Disabato examines the explosion in reading, explores how content is freeing itself from context, and mines the broken ebook landscape in search of business logic and a way out of the present mess.

Publication Standards Part 2:
A Standard Future

by NICK DISABATO

The internet is disrupting many content-focused industries, and the publishing landscape is beginning its own transformation in response. Tools haven’t yet been developed to properly, semantically export long-form writing. Most books are encumbered by Digital Rights Management (DRM), a piracy-encouraging practice long since abandoned by the music industry. In the second article of a two-part series in this issue, Nick Disabato discusses the ramifications of these practices for various publishers and proposes a way forward, so we can all continue sharing information openly, in a way that benefits publishers, writers, and readers alike.


Illustration by Kevin Cornell for A List Apart.

FROM THE HOME PAGE of today’s newly announced, totally disruptive, completely free product powered by Readability: “What’s a Readlist? A group of web pages—articles, recipes, course materials, anything—bundled into an e-book you can send to your Kindle, iPad, or iPhone.”

For some time now, people who miss the point have seen Readability as an app that competes in the read-it-later space. That’s like viewing Andy Warhol as a failed advertising art director. Readability is a platform that radically rethinks how we consume, and who pays for, web content. It monetizes content for authors and its technology is available to all via the API. It scares designers, angers some advertisers. Its transformative potential is huge. Readlists are the latest free product to manifest some of that potential.

With Readlist, anyone can create ebooks out of existing web content. It’s easy. Sign in with your Readability account or sign up for one, and start making books of your favorite web articles.

There are still some bugs being worked out, but hey.

I was honored to beta test the product and create one of the first Readlists, along with Erin Kissane, Anil Dash, Aaron Lammer, David Sleight, and Chris Dary.

Disclaimer: I am on the advisory board of Readability and cofounded The Deck advertising network with Jim Coudal and Jason Fried. Readability removes clutter (including ads) from the reading experience; The Deck sells ads. Conflict of interest? Here’s another: I design content websites so as to make Readability unnecessary (because I design for readers); yet I strongly support Readability as a platform and above all as a web idea that is at least 15 years overdue. Either designers will design for their end-users, or third-party apps will remove designers from the transaction. As a designer, I’m not afraid of that. Rather, it inspires me.

Enjoy Readlists.

13:00

I teach Python classes and enjoy exploring language features from the perspective of newbie's to the language. Usually I can explain the rationale for Python language features by showing a compelling use case. But what about generator functions?

Read More

12:54

The benefits of two factor authentication are pretty much Security 101 material. And we are also tol ...(more)...

10:54

Years ago (but still within the last decade) I was involved in a source control trade study for a large multi-national corporation. Management had let a non-software developer select the original "source control tool" and they had picked something that required custom scripting just to do a baseline (I wish I was kidding).

So a bunch of candidate replacements were put forward for consideration, and CVS won because it was free, thus there would be fewer arguments with management about rolling it out on a project that was already over budget and behind schedule. (The fact that Subversion wasn't considered as a candidate should give you some additional hints about the precise timing of this - Subversion 1.0 was released in February 2004. Yes, for those that are new to this game, you read that right: it is only within the last decade that the majority of the open source VCS world began to enjoy the benefits of atomic commits).

Other interesting aspects of that system included the fact that one of the developers on that project basically had to write a custom xUnit testing system from scratch in order to start putting together a decent automated test suite for the system, there was no code review tool, and you couldn't include direct links to bug tracker items in emails or anything else - you had to reference them by name or number, and people would then look those names or numbers up in the dedicated bug tracking application client.

High level design documentation, if it existed at all, was in the form of Microsoft Word documents. Low level API documentation? Yes, that would have been nice (there were some attempts to generate something vaguely readable with Doxygen but, yeah, well, C++).

Less than ten years later, though, and there are signs our industry is starting to grow up (although I expect many enterprise shops are still paying extortionate rates to the likes of IBM for the "Rational" suite of tools only to gain a significantly inferior development experience):

  1. You can get genuinely high quality code hosting for free. Sure Sourceforge was already around back then, but Git and Mercurial stomp all over CVS from a collaboration point of view. These also come with decent issue trackers and various other collaboration tools. If you don't want to trust a service provider with your code, than tools like GitLab let you set up similar environments internally.
  2. Web based issue trackers are everywhere, with the ubiquitous "issue URL" allowing effective cross-linking between tracker issues, documentation, code comments, source control browsers, code review systems, etc.
  3. Dedicated code review tools like Gerrit and Reitveld are published as open source (and, in the case of the latter, even available as a free service on Google App Engine).
  4. Services like ReadTheDocs exist, allowing you to easily build and publish high quality documentation. All with nice URLs so you can link it from emails, tracker issues, source code, etc.
  5. Organisations like Shining Panda CI and Travis CI provide hosted continuous integration services that put the internal capabilities of many large companies to shame.
  6. Language communities provide cross-platform distribution services to reach a global audience.
  7. Depending on the language you use, you may even have tools like SonarSource available
  8. Once you go into production in the web application world, service components like Sentry, Piwik, and Graphite are again available for no charge.
And to access all this good stuff for free? All you have to do is be willing to share your work (and sometimes not even that). If you don't want to share your work, then the service providers generally have very reasonable fees - you could probably put together a state of the art suite of tools for less than a few hundred bucks a month.

Take my own hobby projects as an example:
  • they're hosted on BitBucket as Mercurial projects (I happen to prefer Mercurial, although I can definitely see why people like Git, too). That gives me integrated issue tracking and online source code browsing, too. (OK, so I could have had essentially that back in the early SourceForge days, but the UI aspects have improved in many respects in the intervening years)
  • I can publish my projects on the Python Package Index with a simple "setup.py sdist upload". They're then available for anyone in the world to install with a straightforward command like "pip install walkdir"
  • thanks to Shining Panda CI, I know the downloads from PyPI work, and I also know that the projects work on all the versions and implementations of Python I want to support
  • thanks to ReadTheDocs and Sphinx, you can read nicely formatted documentation like this rather than trying to decipher plain text files or wiki pages.
I'm living in the future and it is seriously cool (and that's just looking at things purely from a software development infrastructure point of view - the rise of "Infrastructure as a Service" and "Platform as a Service" providers, including Red Hat's own OpenShift, has massive implications on the deployment side of things, and there's of course the implications of the many open source wheels that don't need to be reinvented)

The best part from my point of view is that these days I get to work for a company that already genuinely understands the long term significance of the power of collaborative development. It also doesn't hurt that there's still a lot of money to be made in helping the rest of the enterprise world come to grips with that reality :)

Problem thirteen from Project Euler is one of those problems that's so simple, I don't understand why it's in the double digits section. The problem reads: “Work out the first ten digits of the sum of the following one-hundred 50-digit numbers.”
It then proceeds to list 100 long numbers. I'm not going to paste them here because they are in the code solutions below and I don't want to clog up the “tubez” with more redundant information than I'm about to.

Enough of my jibber-jabber. Here is my Haskell solution first (trying to change things up here):

  1. module Main where
  2.  
  3. main :: IO()
  4. main = do
  5. print . take 10 . show $ sum big_number
  6. where big_number = [ 37107287533902102798797998220837590246510135740250
  7. , 46376937677490009712648124896970078050417018260538
  8. , 74324986199524741059474233309513058123726617309629
  9. , 91942213363574161572522430563301811072406154908250
  10. , 23067588207539346171171980310421047513778063246676
  11. , 89261670696623633820136378418383684178734361726757
  12. , 28112879812849979408065481931592621691275889832738
  13. , 44274228917432520321923589422876796487670272189318
  14. , 47451445736001306439091167216856844588711603153276
  15. , 70386486105843025439939619828917593665686757934951
  16. , 62176457141856560629502157223196586755079324193331
  17. , 64906352462741904929101432445813822663347944758178
  18. , 92575867718337217661963751590579239728245598838407
  19. , 58203565325359399008402633568948830189458628227828
  20. , 80181199384826282014278194139940567587151170094390
  21. , 35398664372827112653829987240784473053190104293586
  22. , 86515506006295864861532075273371959191420517255829
  23. , 71693888707715466499115593487603532921714970056938
  24. , 54370070576826684624621495650076471787294438377604
  25. , 53282654108756828443191190634694037855217779295145
  26. , 36123272525000296071075082563815656710885258350721
  27. , 45876576172410976447339110607218265236877223636045
  28. , 17423706905851860660448207621209813287860733969412
  29. , 81142660418086830619328460811191061556940512689692
  30. , 51934325451728388641918047049293215058642563049483
  31. , 62467221648435076201727918039944693004732956340691
  32. , 15732444386908125794514089057706229429197107928209
  33. , 55037687525678773091862540744969844508330393682126
  34. , 18336384825330154686196124348767681297534375946515
  35. , 80386287592878490201521685554828717201219257766954
  36. , 78182833757993103614740356856449095527097864797581
  37. , 16726320100436897842553539920931837441497806860984
  38. , 48403098129077791799088218795327364475675590848030
  39. , 87086987551392711854517078544161852424320693150332
  40. , 59959406895756536782107074926966537676326235447210
  41. , 69793950679652694742597709739166693763042633987085
  42. , 41052684708299085211399427365734116182760315001271
  43. , 65378607361501080857009149939512557028198746004375
  44. , 35829035317434717326932123578154982629742552737307
  45. , 94953759765105305946966067683156574377167401875275
  46. , 88902802571733229619176668713819931811048770190271
  47. , 25267680276078003013678680992525463401061632866526
  48. , 36270218540497705585629946580636237993140746255962
  49. , 24074486908231174977792365466257246923322810917141
  50. , 91430288197103288597806669760892938638285025333403
  51. , 34413065578016127815921815005561868836468420090470
  52. , 23053081172816430487623791969842487255036638784583
  53. , 11487696932154902810424020138335124462181441773470
  54. , 63783299490636259666498587618221225225512486764533
  55. , 67720186971698544312419572409913959008952310058822
  56. , 95548255300263520781532296796249481641953868218774
  57. , 76085327132285723110424803456124867697064507995236
  58. , 37774242535411291684276865538926205024910326572967
  59. , 23701913275725675285653248258265463092207058596522
  60. , 29798860272258331913126375147341994889534765745501
  61. , 18495701454879288984856827726077713721403798879715
  62. , 38298203783031473527721580348144513491373226651381
  63. , 34829543829199918180278916522431027392251122869539
  64. , 40957953066405232632538044100059654939159879593635
  65. , 29746152185502371307642255121183693803580388584903
  66. , 41698116222072977186158236678424689157993532961922
  67. , 62467957194401269043877107275048102390895523597457
  68. , 23189706772547915061505504953922979530901129967519
  69. , 86188088225875314529584099251203829009407770775672
  70. , 11306739708304724483816533873502340845647058077308
  71. , 82959174767140363198008187129011875491310547126581
  72. , 97623331044818386269515456334926366572897563400500
  73. , 42846280183517070527831839425882145521227251250327
  74. , 55121603546981200581762165212827652751691296897789
  75. , 32238195734329339946437501907836945765883352399886
  76. , 75506164965184775180738168837861091527357929701337
  77. , 62177842752192623401942399639168044983993173312731
  78. , 32924185707147349566916674687634660915035914677504
  79. , 99518671430235219628894890102423325116913619626622
  80. , 73267460800591547471830798392868535206946944540724
  81. , 76841822524674417161514036427982273348055556214818
  82. , 97142617910342598647204516893989422179826088076852
  83. , 87783646182799346313767754307809363333018982642090
  84. , 10848802521674670883215120185883543223812876952786
  85. , 71329612474782464538636993009049310363619763878039
  86. , 62184073572399794223406235393808339651327408011116
  87. , 66627891981488087797941876876144230030984490851411
  88. , 60661826293682836764744779239180335110989069790714
  89. , 85786944089552990653640447425576083659976645795096
  90. , 66024396409905389607120198219976047599490197230297
  91. , 64913982680032973156037120041377903785566085089252
  92. , 16730939319872750275468906903707539413042652315011
  93. , 94809377245048795150954100921645863754710598436791
  94. , 78639167021187492431995700641917969777599028300699
  95. , 15368713711936614952811305876380278410754449733078
  96. , 40789923115535562561142322423255033685442488917353
  97. , 44889911501440648020369068063960672322193204149535
  98. , 41503128880339536053299340368006977710650566631954
  99. , 81234880673210146739058568557934581403627822703280
  100. , 82616570773948327592232845941706525094512325230608
  101. , 22918802058777319719839450180888072429661980811197
  102. , 77158542502016545090413245809786882778948721859617
  103. , 72107838435069186155435662884062257473692284509516
  104. , 20849603980134001723930671666823555245252804609722
  105. , 53503534226472524250874054075591789781264330331690]

followed by my Python solution:

  1. #!/usr/bin/python
  2. """
  3. code solution for project euler's problem #13 in python.
  4. """
  5. from __future__ import print_function
  6.  
  7. def print_10(number):
  8. print(str(number)[0:10])
  9.  
  10. if __name__ == "__main__":
  11.  
  12. big_number = [ 37107287533902102798797998220837590246510135740250,
  13. 46376937677490009712648124896970078050417018260538,
  14. 74324986199524741059474233309513058123726617309629,
  15. 91942213363574161572522430563301811072406154908250,
  16. 23067588207539346171171980310421047513778063246676,
  17. 89261670696623633820136378418383684178734361726757,
  18. 28112879812849979408065481931592621691275889832738,
  19. 44274228917432520321923589422876796487670272189318,
  20. 47451445736001306439091167216856844588711603153276,
  21. 70386486105843025439939619828917593665686757934951,
  22. 62176457141856560629502157223196586755079324193331,
  23. 64906352462741904929101432445813822663347944758178,
  24. 92575867718337217661963751590579239728245598838407,
  25. 58203565325359399008402633568948830189458628227828,
  26. 80181199384826282014278194139940567587151170094390,
  27. 35398664372827112653829987240784473053190104293586,
  28. 86515506006295864861532075273371959191420517255829,
  29. 71693888707715466499115593487603532921714970056938,
  30. 54370070576826684624621495650076471787294438377604,
  31. 53282654108756828443191190634694037855217779295145,
  32. 36123272525000296071075082563815656710885258350721,
  33. 45876576172410976447339110607218265236877223636045,
  34. 17423706905851860660448207621209813287860733969412,
  35. 81142660418086830619328460811191061556940512689692,
  36. 51934325451728388641918047049293215058642563049483,
  37. 62467221648435076201727918039944693004732956340691,
  38. 15732444386908125794514089057706229429197107928209,
  39. 55037687525678773091862540744969844508330393682126,
  40. 18336384825330154686196124348767681297534375946515,
  41. 80386287592878490201521685554828717201219257766954,
  42. 78182833757993103614740356856449095527097864797581,
  43. 16726320100436897842553539920931837441497806860984,
  44. 48403098129077791799088218795327364475675590848030,
  45. 87086987551392711854517078544161852424320693150332,
  46. 59959406895756536782107074926966537676326235447210,
  47. 69793950679652694742597709739166693763042633987085,
  48. 41052684708299085211399427365734116182760315001271,
  49. 65378607361501080857009149939512557028198746004375,
  50. 35829035317434717326932123578154982629742552737307,
  51. 94953759765105305946966067683156574377167401875275,
  52. 88902802571733229619176668713819931811048770190271,
  53. 25267680276078003013678680992525463401061632866526,
  54. 36270218540497705585629946580636237993140746255962,
  55. 24074486908231174977792365466257246923322810917141,
  56. 91430288197103288597806669760892938638285025333403,
  57. 34413065578016127815921815005561868836468420090470,
  58. 23053081172816430487623791969842487255036638784583,
  59. 11487696932154902810424020138335124462181441773470,
  60. 63783299490636259666498587618221225225512486764533,
  61. 67720186971698544312419572409913959008952310058822,
  62. 95548255300263520781532296796249481641953868218774,
  63. 76085327132285723110424803456124867697064507995236,
  64. 37774242535411291684276865538926205024910326572967,
  65. 23701913275725675285653248258265463092207058596522,
  66. 29798860272258331913126375147341994889534765745501,
  67. 18495701454879288984856827726077713721403798879715,
  68. 38298203783031473527721580348144513491373226651381,
  69. 34829543829199918180278916522431027392251122869539,
  70. 40957953066405232632538044100059654939159879593635,
  71. 29746152185502371307642255121183693803580388584903,
  72. 41698116222072977186158236678424689157993532961922,
  73. 62467957194401269043877107275048102390895523597457,
  74. 23189706772547915061505504953922979530901129967519,
  75. 86188088225875314529584099251203829009407770775672,
  76. 11306739708304724483816533873502340845647058077308,
  77. 82959174767140363198008187129011875491310547126581,
  78. 97623331044818386269515456334926366572897563400500,
  79. 42846280183517070527831839425882145521227251250327,
  80. 55121603546981200581762165212827652751691296897789,
  81. 32238195734329339946437501907836945765883352399886,
  82. 75506164965184775180738168837861091527357929701337,
  83. 62177842752192623401942399639168044983993173312731,
  84. 32924185707147349566916674687634660915035914677504,
  85. 99518671430235219628894890102423325116913619626622,
  86. 73267460800591547471830798392868535206946944540724,
  87. 76841822524674417161514036427982273348055556214818,
  88. 97142617910342598647204516893989422179826088076852,
  89. 87783646182799346313767754307809363333018982642090,
  90. 10848802521674670883215120185883543223812876952786,
  91. 71329612474782464538636993009049310363619763878039,
  92. 62184073572399794223406235393808339651327408011116,
  93. 66627891981488087797941876876144230030984490851411,
  94. 60661826293682836764744779239180335110989069790714,
  95. 85786944089552990653640447425576083659976645795096,
  96. 66024396409905389607120198219976047599490197230297,
  97. 64913982680032973156037120041377903785566085089252,
  98. 16730939319872750275468906903707539413042652315011,
  99. 94809377245048795150954100921645863754710598436791,
  100. 78639167021187492431995700641917969777599028300699,
  101. 15368713711936614952811305876380278410754449733078,
  102. 40789923115535562561142322423255033685442488917353,
  103. 44889911501440648020369068063960672322193204149535,
  104. 41503128880339536053299340368006977710650566631954,
  105. 81234880673210146739058568557934581403627822703280,
  106. 82616570773948327592232845941706525094512325230608,
  107. 22918802058777319719839450180888072429661980811197,
  108. 77158542502016545090413245809786882778948721859617,
  109. 72107838435069186155435662884062257473692284509516,
  110. 20849603980134001723930671666823555245252804609722,
  111. 53503534226472524250874054075591789781264330331690]
  112.  
  113. print_10(sum(big_number))

and to continue adding in the spice, I have included a solution in Scala:

  1. import BigInt._
  2.  
  3. object problem_13 {
  4. def main (args : Array[String]){
  5. val big_number = List("37107287533902102798797998220837590246510135740250",
  6. "46376937677490009712648124896970078050417018260538",
  7. "74324986199524741059474233309513058123726617309629",
  8. "91942213363574161572522430563301811072406154908250",
  9. "23067588207539346171171980310421047513778063246676",
  10. "89261670696623633820136378418383684178734361726757",
  11. "28112879812849979408065481931592621691275889832738",
  12. "44274228917432520321923589422876796487670272189318",
  13. "47451445736001306439091167216856844588711603153276",
  14. "70386486105843025439939619828917593665686757934951",
  15. "62176457141856560629502157223196586755079324193331",
  16. "64906352462741904929101432445813822663347944758178",
  17. "92575867718337217661963751590579239728245598838407",
  18. "58203565325359399008402633568948830189458628227828",
  19. "80181199384826282014278194139940567587151170094390",
  20. "35398664372827112653829987240784473053190104293586",
  21. "86515506006295864861532075273371959191420517255829",
  22. "71693888707715466499115593487603532921714970056938",
  23. "54370070576826684624621495650076471787294438377604",
  24. "53282654108756828443191190634694037855217779295145",
  25. "36123272525000296071075082563815656710885258350721",
  26. "45876576172410976447339110607218265236877223636045",
  27. "17423706905851860660448207621209813287860733969412",
  28. "81142660418086830619328460811191061556940512689692",
  29. "51934325451728388641918047049293215058642563049483",
  30. "62467221648435076201727918039944693004732956340691",
  31. "15732444386908125794514089057706229429197107928209",
  32. "55037687525678773091862540744969844508330393682126",
  33. "18336384825330154686196124348767681297534375946515",
  34. "80386287592878490201521685554828717201219257766954",
  35. "78182833757993103614740356856449095527097864797581",
  36. "16726320100436897842553539920931837441497806860984",
  37. "48403098129077791799088218795327364475675590848030",
  38. "87086987551392711854517078544161852424320693150332",
  39. "59959406895756536782107074926966537676326235447210",
  40. "69793950679652694742597709739166693763042633987085",
  41. "41052684708299085211399427365734116182760315001271",
  42. "65378607361501080857009149939512557028198746004375",
  43. "35829035317434717326932123578154982629742552737307",
  44. "94953759765105305946966067683156574377167401875275",
  45. "88902802571733229619176668713819931811048770190271",
  46. "25267680276078003013678680992525463401061632866526",
  47. "36270218540497705585629946580636237993140746255962",
  48. "24074486908231174977792365466257246923322810917141",
  49. "91430288197103288597806669760892938638285025333403",
  50. "34413065578016127815921815005561868836468420090470",
  51. "23053081172816430487623791969842487255036638784583",
  52. "11487696932154902810424020138335124462181441773470",
  53. "63783299490636259666498587618221225225512486764533",
  54. "67720186971698544312419572409913959008952310058822",
  55. "95548255300263520781532296796249481641953868218774",
  56. "76085327132285723110424803456124867697064507995236",
  57. "37774242535411291684276865538926205024910326572967",
  58. "23701913275725675285653248258265463092207058596522",
  59. "29798860272258331913126375147341994889534765745501",
  60. "18495701454879288984856827726077713721403798879715",
  61. "38298203783031473527721580348144513491373226651381",
  62. "34829543829199918180278916522431027392251122869539",
  63. "40957953066405232632538044100059654939159879593635",
  64. "29746152185502371307642255121183693803580388584903",
  65. "41698116222072977186158236678424689157993532961922",
  66. "62467957194401269043877107275048102390895523597457",
  67. "23189706772547915061505504953922979530901129967519",
  68. "86188088225875314529584099251203829009407770775672",
  69. "11306739708304724483816533873502340845647058077308",
  70. "82959174767140363198008187129011875491310547126581",
  71. "97623331044818386269515456334926366572897563400500",
  72. "42846280183517070527831839425882145521227251250327",
  73. "55121603546981200581762165212827652751691296897789",
  74. "32238195734329339946437501907836945765883352399886",
  75. "75506164965184775180738168837861091527357929701337",
  76. "62177842752192623401942399639168044983993173312731",
  77. "32924185707147349566916674687634660915035914677504",
  78. "99518671430235219628894890102423325116913619626622",
  79. "73267460800591547471830798392868535206946944540724",
  80. "76841822524674417161514036427982273348055556214818",
  81. "97142617910342598647204516893989422179826088076852",
  82. "87783646182799346313767754307809363333018982642090",
  83. "10848802521674670883215120185883543223812876952786",
  84. "71329612474782464538636993009049310363619763878039",
  85. "62184073572399794223406235393808339651327408011116",
  86. "66627891981488087797941876876144230030984490851411",
  87. "60661826293682836764744779239180335110989069790714",
  88. "85786944089552990653640447425576083659976645795096",
  89. "66024396409905389607120198219976047599490197230297",
  90. "64913982680032973156037120041377903785566085089252",
  91. "16730939319872750275468906903707539413042652315011",
  92. "94809377245048795150954100921645863754710598436791",
  93. "78639167021187492431995700641917969777599028300699",
  94. "15368713711936614952811305876380278410754449733078",
  95. "40789923115535562561142322423255033685442488917353",
  96. "44889911501440648020369068063960672322193204149535",
  97. "41503128880339536053299340368006977710650566631954",
  98. "81234880673210146739058568557934581403627822703280",
  99. "82616570773948327592232845941706525094512325230608",
  100. "22918802058777319719839450180888072429661980811197",
  101. "77158542502016545090413245809786882778948721859617",
  102. "72107838435069186155435662884062257473692284509516",
  103. "20849603980134001723930671666823555245252804609722",
  104. "53503534226472524250874054075591789781264330331690") map {BigInt(_)}
  105. val sums = big_number sum
  106. val su = sums toString
  107. val su10 = su take 10
  108. println(su10)
  109. }
  110. }

Some of you may be wondering, “Why a Scala solution?” To which I respond, “Why not?” Because that's a little short, I'll add that it has something to do with Scala starting to gain traction in the industry and me seeing if I would like to get paid to program in it.

The solution, in all three languages, is pretty simple. The recipe essentially says, “Put all numbers into a list. Get the sum of that list, turn that number into a string, and get the first 10 characters of that string.”

Times:
Haskell (compiled) : real 0m0.004s
Haskell (runghc) : real 0m0.314s
Python : real 0m0.059s
Scala (compiled) : real 0m0.757s

For the most part it's pretty standard in these tests to see performance times such that Haskell (compiled) Python Haskell (runghc). Java and Perl usually fall somewhere between the Haskell (compiled) and Python, in that order. To see Scala be 2x slower than Haskell (runghc) was a shocker. The only thing that makes sense to me for the slowdown is having to use the BigInt library. That is probably the biggest thing I took away from these time tests - if I want to do REALLY large number crunching and performance DOES matter, JVM-based languages might not be the best option.

A few thoughts on Scala:
If I haven't stated it already in this blog, I should now give the disclaimer that I'm not a Java fan. I know it still has its loyal followers, but I'm not one of them. Moving on. This was my first time working with Scala, and I'd like to finally welcome Java to the 21st century. While doing some research on the Scala language itself I read that “the industry” was moving to replace Java with Scala. I welcome that change. Does that mean I “like” Scala? The honest answer is, to butcher the quote the appliances from the Flintstones, “Eh, it's a language.” Scala is definitely an improvement over Java – not really that hard to do in my opinion – but, the language still feels unpolished. One quick way to kill the interpreter in Scala is to type “Int” then hit the enter key. Instead of error-ing out, the interpreter does a great job of interpreting a crash test car hitting a cement wall (I had to restart the whole thing.) When I tried the same “technique” in the Python interpreter, I got as a response and for Haskell's interpreter I received “Not in scope: data constructor `Int'”. I also found Scala's function composition to be a little lacking when compared to Haskell. I wasn't able to cleanly change the BigInt data type to String, and then only print out ten characters without requiring three separate val's. Yes, I could have used one var instead, but that's beside the point. I will admit it could be my inexperience with the language showing, so if anyone knows a smoother way to do this in Scala please share it in the comments.

All that being said, I do like the way Scala is trying to handle the reducing of Java's dot notation, and I think it's starting to make strides in the right direction in other areas. I'm open to working with Scala more, and look forward to seeing how it evolves over the next few years.

Here's a simple FormView Class Based Views for Django. Here is a sample of how to do one as a simple email form. There is no CAPTCHA in this example, that's the topic of a future blog post.

This version requires the following packages pip installed into your virtualenv.

  • django-crispy-forms so we can do Python driven layouts.
  • django-floppyforms so we get HTML5 elements for free.

They also need to be added to your list of INSTALLED_APPS:

INSTALLED_APPS += (
    'crispy_forms',
    'floppyforms',
)

In myapp.forms.py:

from crispy_forms.helper import FormHelper
from crispy_forms.layout import Submit
import floppyforms as forms

class ContactForm(forms.Form):

    name = forms.CharField(required=True)
    email = forms.EmailField(required=True)
    subject = forms.CharField(required=True)
    message = forms.CharField(widget=forms.Textarea)

    def __init__(self, *args, **kwargs):
        self.helper = FormHelper()
        self.helper.add_input(Submit('submit', 'Submit'))
        super(ContactForm, self).__init__(*args, **kwargs)

In myapp.views.py:

from django.conf import settings
from django.core.mail import send_mail
from django.views.generic import FormView

from myapp.forms import ContactForm

class ContactFormView(FormView):

    form_class = ContactForm
    template_name = "myapp/email_form.html"
    success_url = '/email-sent/'

    def form_valid(self, form):
        message = "{name} / {email} said: ".format(
            name=form.cleaned_data.get('name'),
            email=form.cleaned_data.get('email'))
        message += "\n\n{0}".format(form.cleaned_data.get('message'))
        send_mail(
            subject=form.cleaned_data.get('subject'),
            message=message,
            from_email='contact-form@myapp.com',
            recipient_list=[settings.LIST_OF_EMAIL_RECIPIENTS],
        )
        return super(ContactFormView, self).form_valid(form)

In templates/myapp/email_form.html:

{% extends 'base.html' %}
{% load crispy_forms_tags %}

{% block title %}Send an email{% endblock %}

{% block content %}
    <div class="row">
        <div class="span6">
            <h1>Send an email</h1>
            {% crispy form form.helper %}
        </div>
    </div>
{% endblock %}

{% block extrajs %}
<script src="{{ STATIC_URL }}js/jquery-1.7.1.min.js"></script>
<script type="text/javascript">
$(function() {
    $('#id_name').focus()
});
</script>
{% endblock %}

Tomorrow's blog post

In tomorrow's post I'll show how to add CAPTCHA into your project to help reduce spam messages.

Want to learn more?

If you live in the Los Angeles area and want to learn more about Django, everything from the basics to setting up a Content Management System or E-Commerce system, check out our Django (and Python) training at Cartwheel Academy.

08:18



“The Cleveland Tourism Board gave me 14 million dollars about 8 months ago to make a promotional video to bring people to Cleveland. As usual, I waited till the last minute and I ended up having to shoot and edit it in about an hour yesterday afternoon.” — bishopvids

07:09

A recent proposal, supported by many current web browsers, suggests the addition of a Do Not T ...(more)...

01:00

Only 10 days left and the Early bird tickets are no longer available for the Plone Conference 2012. Buy your ticket now at the reduced rate of 275,- euro. From the 1st of June the rate will go up to the regular rate of 325,- euro. Save yourself 50,- euro.

For only 275,- euro you get 3 day's of Plone related talks about integration, development, marketing and use cases about projects done in Plone. The conference ticket includes: Conference entrance fee, lunch and coffee breaks during conference days, goodie-bag, VAT, a big splashing Party including free drinks! Dinner before the party and maybe best of all, two sprint days to work with all your favorite Plone people to make Plone even better.

Time and tickets are ticking

Not only there are just 10 days left, but you can buy your ticket now, before the early bird limit of 200 tickets is reached! Yes, join people from all over the world. Meet with the people from our Gold sponsors like the 10 people from GW20e, or Zest Software who submitted there talk already and are giving a training!

Want to be a speaker at the Plone Conf 2012? Submit your talk! Want to know everything about Plone? How about following a training first and then meet at the conference days. In both cases, buy a ticketbook your hotel, sit back and relax as you know you will be at the Plone Conf 2012.

Monday, 21 May

22:54

Furtively he looks round, then takes from the desk drawer a comic-book entitled ‘Thrills and Adventure’. We see the frames of the comic strip. A Superman-type character and a girl are shrinking from an explosion. She is saying ‘My God, his nose just exploded with enough force to destroy his kleenex’. In the next frame, the Superman character is saying ‘If only I had a kleenex to lend him – or even a linen handkerchief – but these trousers…!! No back pocket!’ In the frame beneath, he flies from side to side attempting to escape; finally he breaks through, bringing the two frames above down on himself. Cut to a picture of a safety curtain.

Last tutorial we covered ‘parsing’.  We broke a standard config file up into separate lines and then we broke each line into pairs, each pair having a ‘key’ and a ‘value’.   If we’re to edit the file, we need a way to edit the values (and after that we’ll write the edited values back to the config file).  We’re going to see how to use Tkinter to do that in this tutorial.

Let’s start thinking about how a single key/value pair will look.  I am thinking of having the key on the left hand side with a space to input the value on the right hand side.  To do this we will use the Label widget that we’ve met before and the Text widgetLabel is used to display static text – ie text that will not be edited, while the Text widget allows the user to edit the text which is displayed in the widget, and for the program to read what is entered in the widget.  The Text widget is the GUI equivalent of raw_input() that we met so long ago.

from Tkinter import *

root = Tk()
labelWidget = Label(root,text="A key:")
textWidget = Text(root)

textWidget.insert('1.0',"A Value")
labelWidget.pack(side=LEFT)
textWidget.pack(side=RIGHT)

root.mainloop()

Here we first create a Tk() object, then create a label widget and a text widget in that object.   We pack the label first and we pack it on the LEFT side (LEFT is actually the name of a constant in Tk which Tk translates to ‘put this on the left’) and pack the text widget on the right side.  At location “1.0″ we add the text “A Value” to the Text widget.  Here the number “1.0″ means “row 1, at character position 0″, which is to say, at the very start of the text.

If I run this code I get something like this:

Which is sort of what I wanted – a label on the left, and an editable text box on the right (can you see the cursor in the screen shot?) – click the close window widget in the top right corner to close the window.

Exercise:  Type something into the text box.  See if you can do it to the label.

However, this isn’t really what I wanted.  I wanted a little text box, not the enormous one I’ve got here.  Since I didn’t specify a height and width for it, the Text widget used its default size (which is way too big).  The user also doesn’t have any way to tell the program to use (or cancel) the edit.  Let’s change the code to add an ok and cancel button, and to change the size of the text widget.

Here is a revised version which is nearer to what I was looking for:

from Tkinter import *

def okClicked():
    '''Get the edited values and write them to the file then quit'''
    #TODO: get values and write them to the file!
    exit()
def cancelClicked():
    '''Cancel edits and quit'''
    exit()

root = Tk()
labelWidget = Label(root,text="A key:")
textWidget = Text(root, width=60, height = 1)
okWidget = Button(root, text= "Ok", command = okClicked)
cancelWidget = Button(root, text="Cancel", command = cancelClicked)

textWidget.insert('1.0',"A Value")
labelWidget.pack(side=LEFT)
cancelWidget.pack(side=RIGHT)
okWidget.pack(side=RIGHT)

textWidget.pack(side=RIGHT)

root.mainloop()

This gives:

I have added a couple of functions to be run when the ok and cancel buttons are clicked.  The ok button’s function is still a little empty at the moment though…  I have specified the width to be 60 characters and the height to be one row.  Note these are not pixel measurements.  If you change the size of the font the text box will also change.

Notice also the way the geometry is working.  The widgets which are pack(side=RIGHT) are added at the right in the order they are packed.  If the buttons were packed last they would be between the label and the text window.

Exercise: change the program so that the widgets are packed in a different order. What happens if you try side=TOP or side=BOTTOM?

The value can be edited by the user typing directly into the text box.  The text in the text box can also be edited programmatically, which is to say its contents can be changed by the program without the user typing.  See the Tkinter documentation for details – or clamour on this site and I’ll add a tute.

One thing that I don’t like about this layout is the fact the buttons are on the same line as the key label and value text.  When you did the exercise above, you should have noticed that side=TOP and side=BOTTOM don’t really help, since you can’t position the ok and cancel buttons on the same line.  What we need to use is the Frame widgetFrames can be thought of as empty spaces in which you can group widgets together.  By treating the widgets within the Frame as a group, additional layouts can be achieved.   Frames can be packed inside other frames. We will use two frames, one on top of the other.  In the first frame we pack the label and text widgets.  In the second frame we pack the two buttons.

Here is the code:

from Tkinter import *

def okClicked():
    '''Get the edited values and write them to the file then quit'''
    #TODO: get values and write them to the file!
    exit()
def cancelClicked():
    '''Cancel edits and quit'''
    exit()

root = Tk()

topFrame = Frame(root)
bottomFrame = Frame(root)
labelWidget = Label(topFrame,text="A key:")
textWidget = Text(topFrame, width=60, height = 1)
okWidget = Button(bottomFrame, text= "Ok", command = okClicked)
cancelWidget = Button(bottomFrame, text="Cancel", command = cancelClicked)

textWidget.insert('1.0',"A Value")
labelWidget.pack(side=LEFT)
cancelWidget.pack(side=RIGHT)
okWidget.pack(side=LEFT)
textWidget.pack(side=RIGHT)

topFrame.pack(side=TOP)
bottomFrame.pack(side=BOTTOM)

root.mainloop()

This is more like what I wanted (notice you can’t see the individual frames).  One might quibble with the location of the ok and cancel buttons.  Maybe they should be offset a little from the centre?  Maybe they should be off to one side.  In any event they are in the general layout that i was looking for: a key label and an editable value text above the ok and cancel buttons. Notice in the code that the widgets we used before have had their parent changed from root to either topFrame or bottomFrame?  However these Frame widgets have root as their parent.   So the original widgets we were using have effectively been pushed down one level in the hierarchy.

We still don’t know how to get what the user has typed into the text box, but maybe I can leave that as an exercise for the reader (try textWidget.get(“1.0″,END)).

Homework:  change the okClicked() function so that it prints the contents of the text box before exiting.  Use the hint in the previous paragraph.


Mr. Simpson:     Good. Well I have this large quantity of string, a hundred and twenty-two thousand miles of it to be exact, which I inherited, and I thought if I advertised it–
Wapcaplet:     Of course! A national campaign. Useful stuff, string, no trouble there.
Mr. Simpson:     Ah, but there’s a snag, you see. Due to bad planning, the hundred and twenty-two thousand miles is in three inch lengths. So it’s not very useful.

In the previous tutorial we learnt that when we pack Tkinter objects, the order in which we pack them affects how they are displayed in the GUI.  We used a Text widget to enable the user to edit text in the GUI.  We also learnt to use a Frame widget to help with the layout of our GUI.  In particular, we put a Label and a Text widget together into one Frame, and put an ok and cancel Button into another.   For homework you needed to get data from the Text widget.

In order to display all of the configuration options we are going to go a bit nutty using Frames.   We will eventually (but not today) use one Frame to hold all of the configuration options, and another Frame to hold the Ok and Cancel buttons.  But that’s not all!  We will also use a Frame to house each configuration option (ie key and value pair).  Before we do that though, we need to remember where we were up to reading and parsing the server.properties file.

Here is the code we finished with two tutorials ago, excluding the last couple of lines (which printed out the results) for your reference if you need it (click to expand):

'''Minecraft config editor:
This is an editor for the Minecraft server.properties file.  It:
* opens the file server.properties
* reads, then closes the file
* parses each line by
-- stripping leading and trailing whitespace
-- if the line starts with "#", marks it as a comment
-- splits the line into a key, value pair, with the pair separated by a "=" sign
-- if the value of the pair is either "true" or "false", the entry is marked as a boolean (ie its only values are either true or false)
* displays each key, value entry on the screen allowing you to edit it
* renames the server.properties file to server.properties.bup (overwriting any existing file of that name from earlier edits)
* opens a new file called server.properties
* writes each of the entries to that new file
* closes the server.properties file.
'''

class configItem(object):  # name of the class, it is based on an object called 'object'
  def __init__(self, line):# this is called each time an instance of the class is created
    line = line.strip()  # this removes any white space at the start or end of the line
    # if it starts with a # it's a comment so check for it
    if line[:1] == "#":
       self.configKey = "#"
       self.configVal = line[1:]
    else:  # otherwise assume it's of the form x = y
       spam = line.split("=")
       self.configKey = spam[0]
       self.configVal = spam[1]
    # now check to see whether the config item takes only the values "true" and "false"
    if self.configVal.lower() in ["true","false"]:
       self.isTrueFalse = True
    else:
       self.isTrueFalse = False

# get data from the file

fileName = "server.properties"
fileObject = open(fileName,'rb')
fileData = fileObject.read()
fileObject.close()

configLines = []

for line in fileData.split('\n'):  # this splits it into individual lines
    if line.strip()=='':
      continue
    configLines.append(configItem(line))

If you remember we defined a class called configItem.  We read the lines from the config file and used each line to create instances of configItem.  We stored those in an array called configLines.  Each instance has two attributesconfigKey and configVal (that is, the things on the left and right hand side of the equals respectively).  In the last tutorial for one key, value pair we:

  • created a label and set it equal to configKey;
  • created a text widget and set its value to configVal; and, finally,
  • created a frame in which to pack each of these.

Now we have to do that for each and every entry in the array.  There are plenty of ways to do this.  However, I am going to do it by “subclassing” the configItem class.  That is, I am going to create a new class which is based on (“inherits from” or “is a subclass of”) the configItem class.  It has the features of the configItem class but will also store some stuff relating to the Tkinter widgets that we will need.  This is the new class which I’ve called guiConfigItem:

class guiConfigItem(configItem):
  def __init__(self,line):
    super(guiConfigItem,self).__init__(line)  # run configItem's __init__ method
    self.frame = Frame()
    self.keyLabel = Label(self.frame, text = self.configKey)
    self.valueEntry = Entry(self.frame, width="60")
    self.valueEntry.insert("0",self.configVal)
    self.keyLabel.pack(side=LEFT )
    self.valueEntry.pack(side=RIGHT)
    self.frame.pack(side=TOP)

Some things to note about this class:

  • instead of the first line ending “(object):” like the other classes we’ve seen, this one ends “(configItem):”.  This means that guiConfigItem’s immediate parent is configItem.  However, since configItem is based on object, in the end, so is guiConfigItem.
  • it takes the same initialisation parameters as configItem (that is, self and line)
  • the first thing it does in initialising stuff is to call super(guiConfigItem,self).__init__(line).  This runs configItem’s __init__ method, so every guiConfigItem starts with the same initialisation that configItem would have
  • it starts by creating a Frame, stores it in self.frame, then, inside the frame, it creates a label and an Entry widget.  An Entry widget is the single line version of the Text widget we used last time, and it should be good enough for our purposes.
  • you can tell that the Label and Entry widgets are created inside the frame which has been created because the first parameter passed to them is self.frame.
  • the Label widget is packed to the LEFT, and the Entry widget to the RIGHT.  the frame is also packed, but it is packed to TOP (ie it will make a list from top to bottom)

This class is added after the definition of the configItem class.  In order to get it working we just have to make a four changes to the program.  We will:

  • import Tkinter – from Tkinter import *;
  • create a root window in which to pack things - root =Tk();
  • change the loop to create guiConfigItems rather than configItems – configLines.append(guiConfigItem(line=line)); and
  • we will start the gui with a mainloop() – root.mainloop()

Here is the updated source code:


'''Minecraft config editor:
This is an editor for the Minecraft server.properties file.  It:
* opens the file server.properties
* reads, then closes the file
* parses each line by
-- stripping leading and trailing whitespace
-- if the line starts with "#", marks it as a comment
-- splits the line into a key, value pair, with the pair separated by a "=" sign
-- if the value of the pair is either "true" or "false", the entry is marked as a boolean (ie its only values are either true or false)
* displays each key, value entry on the screen allowing you to edit it
* renames the server.properties file to server.properties.bup (overwriting any existing file of that name from earlier edits)
* opens a new file called server.properties
* writes each of the entries to that new file
* closes the server.properties file.
'''

from Tkinter import *

class configItem(object):  # name of the class, it is based on an object called 'object'
  def __init__(self, line):# this is called each time an instance of the class is created
    line = line.strip()  # this removes any white space at the start or end of the line
    # if it starts with a # it's a comment so check for it
    if line[:1] == "#":
       self.configKey = "#"
       self.configVal = line[1:]
    else:  # otherwise assume it's of the form x = y
       spam = line.split("=")
       self.configKey = spam[0]
       self.configVal = spam[1]
    # now check to see whether the config item takes only the values "true" and "false"
    if self.configVal.lower() in ["true","false"]:
       self.isTrueFalse = True
    else:
       self.isTrueFalse = False

class guiConfigItem(configItem):
  def __init__(self,line):
    super(guiConfigItem,self).__init__(line)  # run configItem's __init__ method
    self.frame = Frame()
    self.keyLabel = Label(self.frame, text = self.configKey)
    self.valueEntry = Entry(self.frame, width="60")
    self.valueEntry.insert("0",self.configVal)
    self.keyLabel.pack(side=LEFT )
    self.valueEntry.pack(side=RIGHT)
    self.frame.pack(side=TOP)

# get data from the file

fileName = "server.properties"
fileObject = open(fileName,'rb')
fileData = fileObject.read()
fileObject.close()

root = Tk()
configLines = []

for line in fileData.split('\n'):  # this splits it into individual lines
    if line.strip()=='':
      continue
    configLines.append(guiConfigItem(line=line))

root.mainloop()

When I run this I get:


Wow, is that magic? The way we defined the class meant that each of the instances packed itself for us as we created them.  This is an example of why using classes can be so much fun.

Exercise: how might you do the same thing without using classes?

That said, the alignment is a little wonky.   This is because each of the individual frames (there is one on each line) are different sizes.  The overall window is big enough to fit the biggest, but that means that the smaller lines aren’t big enough.  This can be remedied by adding fill=”x” (that is, fill in the x (horizontal) direction if necessary to the pack command for each of the Frames:

self.frame.pack(side=TOP, fill="x")

Now the window looks much better:

Exercise: confirm that you can edit the values on the right.

Exercise 2: check through our docstring to see what we’ve done so far and what we’ve got left to do.

The complete source code with the final edit is below.

'''Minecraft config editor:
This is an editor for the Minecraft server.properties file.  It:
* opens the file server.properties
* reads, then closes the file
* parses each line by
-- stripping leading and trailing whitespace
-- if the line starts with "#", marks it as a comment
-- splits the line into a key, value pair, with the pair separated by a "=" sign
-- if the value of the pair is either "true" or "false", the entry is marked as a boolean (ie its only values are either true or false)
* displays each key, value entry on the screen allowing you to edit it
* renames the server.properties file to server.properties.bup (overwriting any existing file of that name from earlier edits)
* opens a new file called server.properties
* writes each of the entries to that new file
* closes the server.properties file.
'''

from Tkinter import *

class configItem(object):  # name of the class, it is based on an object called 'object'
  def __init__(self, line):# this is called each time an instance of the class is created
    line = line.strip()  # this removes any white space at the start or end of the line
    # if it starts with a # it's a comment so check for it
    if line[:1] == "#":
       self.configKey = "#"
       self.configVal = line[1:]
    else:  # otherwise assume it's of the form x = y
       spam = line.split("=")
       self.configKey = spam[0]
       self.configVal = spam[1]
    # now check to see whether the config item takes only the values "true" and "false"
    if self.configVal.lower() in ["true","false"]:
       self.isTrueFalse = True
    else:
       self.isTrueFalse = False

class guiConfigItem(configItem):
  def __init__(self,line):
    super(guiConfigItem,self).__init__(line)  # run configItem's __init__ method
    self.frame = Frame()
    self.keyLabel = Label(self.frame, text = self.configKey)
    self.valueEntry = Entry(self.frame, width="60")
    self.valueEntry.insert("0",self.configVal)
    self.keyLabel.pack(side=LEFT )
    self.valueEntry.pack(side=RIGHT)
    self.frame.pack(side=TOP, fill="x")

# get data from the file

fileName = "server.properties"
fileObject = open(fileName,'rb')
fileData = fileObject.read()
fileObject.close()

root = Tk()
configLines = []

for line in fileData.split('\n'):  # this splits it into individual lines
    if line.strip()=='':
      continue
    configLines.append(guiConfigItem(line=line))

root.mainloop()


Mr Mann     Ee ecky thump! (indicates more power)
Third Booth     Ee ecky thump!
Mr Mann     Excellent.
Third Booth     Thank you, sir. (puts earphones on, listens)
Mr Mann     It’s a really quick method of learning.

There are two more things to do with our Minecraft config file editor before we’ve got the main part of it working (we may do some tweaking later).  We need to:

  • add the Ok and Cancel buttons back; and
  • when someone clicks Ok, we need to update the server.properties file

We’re doing the first of these today. We saw earlier how to do the Ok and Cancel buttons, although at the time we didn’t actually put any meat in the functions they called.  So, let’s fill that out now.  For Cancel, we are just going to quit the editor without making any changes – that’s pretty easy.  For the Ok button though, we’re going to have to:

  1. somehow read all of the values from the screen (since we don’t know which ones have been changed we need to read them all);
  2. make a backup of the server.properties file
  3. write all of the key:value pairs to the new server.properties file.

Unlike variables, widgets are not the same as what is stored in them.  If we have an Entry widget called E and we want to store what has been typed there in a variable called text we can’t just write text = E.   This is because E is not a variable as we understand it.  Actually, E is an instance of a class.  This would just make another reference to the same Entry widget with the name text.  Rather, we want to “get” the current value of the text entered into E.  It turns out that the Entry widget has a method (called get()) which gets that text for you.

>>> from Tkinter import *
>>> E = Entry()  # this should pop up a Tkinter window
>>> E.pack()  # the widget should appear in your Tkinter window now
>>> type(E)
<type 'instance'>
>>> type(Entry)
<type 'classobj'>
>>> text = E
>>> type(text)
<type 'instance'>
>>> print text
.140543131462184
>>> # now type "Hi P4K!" in the entry widget
...
>>> text = E.get()
>>> print text
Hi P4K!
>>> # now add " - Again" to the end of the entry widget (leave the "Hi P4K!" there)
...
>>> text = E.get()
>>> print text
Hi P4K! - Again
>>> # you can also print the value which you get() without storing it first:
...
>>> print E.get()
Hi P4K! - Again

So what we’re going to do in our code is get() all these edited values when someone clicks “Ok”.  We could do that directly, for example by finding the relevant guiConfigItem and calling the get() method on the valueEntry attribute of that item.   That would also mean we’d have to make a copy of the key for that item and then combine them together with “=” before we wrote them to the server.properties file.  This would mean that logic which is relevant to the configItem class would be stored somewhere other than inside the class - which rather defeats the purpose of having a class to keep track of these things.  Instead, we’re going to add a method to the guiConfigItem class which updates the values it has stored.  That turns out to be pretty easy:

  def update(self):
    ''' Get the value which is currently in the Entry widget and save it to configVal'''
    if self.isTrueFalse:
      '''if isTrueFalse is True, then we should only have the values 'true' and 'false' in this
      Entry.  So, only update the configuration value if it is one of these two.  Otherwise, ignore it. '''
      spam = self.valueEntry.get()
      if spam in ['true','false']:
	self.configVal = spam
    else:
      '''this is not a variable which is limited to 'true' and 'false', so store the whole text'''
      self.configVal = self.valueEntry.get()

Note here that we are referencing the attribute configVal which is defined in the parent class.  Also note that we’ve included a bit of logic here to ensure that those configuration values which start as ‘true’ or ‘false’ can only be ‘true’ or ‘false’.  If you type something else into them it will be ignored.  It is sufficient here to just say if self.isTrueFalse rather than if self.isTrueFalse is True (the “is True” is redundant).

We also need a way to prepare the lines of the server.properties file to be printed or written to the file.  We do this by adding a method to the configItem class (since it doesn’t have anything to do with the graphical interface we don’t add it to the subclass):

  def item2ConfigLine(self):
    if self.configKey=="#":
      '''If the key is '#' then this is a comment, so don't include an '=' sign'''
      return "%s%s"%(self.configKey, self.configVal)
    else:
      '''otherwise, it has the form key=value'''
      return "%s=%s"%(self.configKey,self.configVal)

See this tutorial for an explanation of the %s stuff…

We don’t have a way to test these methods out yet. So let’s hook up the Ok and Cancel buttons. The Ok button will run through each of the guiConfigItems and update it, then print out the configuration line. After all items have been processed this way, the program will exit. The cancel button will just exit without doing anything. So, we need to:
1. create callbacks for each of the buttons,
2. create a frame for the buttons to go in
3. create the buttons, hooking up each of the buttons up to the callback
4. pack the buttons, then, finally,
5. pack the frame.
These go before the root.mainloop() line.

# 1. create callbacks for each of the buttons,
def okClicked():
  '''Get the edited values and write them to the file then quit'''
  for c in configLines:
    c.update()
    print c.item2ConfigLine()
  # have updated and printed each line, now exit
  exit()

def cancelClicked():
  '''Cancel edits and quit'''
  exit()

# 2. create a frame for the buttons to go in
bottomFrame = Frame(root)

# 3. create the buttons, hooking up each of the buttons up to the callback
okWidget = Button(bottomFrame, text= "Ok", command = okClicked)
cancelWidget = Button(bottomFrame, text="Cancel", command = cancelClicked)

# 4. pack the buttons, then, finally,
okWidget.pack(side=LEFT)
cancelWidget.pack(side=RIGHT)
# 5. then pack the frame:
bottomFrame.pack(side=BOTTOM)

At the moment, the Ok button just prints out the values of the items. This is so that we can test how it is working before we let it go editing the actual file.

Here is the complete source code:

# -*- coding: utf-8 -*-

'''Minecraft config editor:
This is an editor for the Minecraft server.properties file.  It:
* opens the file server.properties
* reads, then closes the file
* parses each line by
-- stripping leading and trailing whitespace
-- if the line starts with "#", marks it as a comment
-- splits the line into a key, value pair, with the pair separated by a "=" sign
-- if the value of the pair is either "true" or "false", the entry is marked as a boolean (ie its only values are either true or false)
* displays each key, value entry on the screen allowing you to edit it
* renames the server.properties file to server.properties.bup (overwriting any existing file of that name from earlier edits)
* opens a new file called server.properties
* writes each of the entries to that new file
* closes the server.properties file.
'''

from Tkinter import *

class configItem(object):  # name of the class, it is based on an object called 'object'
  def __init__(self, line):# this is called each time an instance of the class is created
    line = line.strip()  # this removes any white space at the start or end of the line
    # if it starts with a # it's a comment so check for it
    if line[:1] == "#":
       self.configKey = "#"
       self.configVal = line[1:]
    else:  # otherwise assume it's of the form x = y
       spam = line.split("=")
       self.configKey = spam[0]
       self.configVal = spam[1]
    # now check to see whether the config item takes only the values "true" and "false"
    if self.configVal.lower() in ["true","false"]:
       self.isTrueFalse = True
    else:
       self.isTrueFalse = False

  def item2ConfigLine(self):
    if self.configKey=="#":
      '''If the key is '#' then this is a comment, so don't include an '=' sign'''
      return "%s%s"%(self.configKey, self.configVal)
    else:
      '''otherwise, it has the form key=value'''
      return "%s=%s"%(self.configKey,self.configVal)

class guiConfigItem(configItem):
  def __init__(self,line):
    super(guiConfigItem,self).__init__(line)  # run configItem's __init__ method
    self.frame = Frame()
    self.keyLabel = Label(self.frame, text = self.configKey)
    self.valueEntry = Entry(self.frame, width="60")
    self.valueEntry.insert("0",self.configVal)
    self.keyLabel.pack(side=LEFT )
    self.valueEntry.pack(side=RIGHT)
    self.frame.pack(side=TOP, fill="x")

  def update(self):
    ''' Get the value which is currently in the Entry widget and save it to configVal'''
    if self.isTrueFalse:
      '''if isTrueFalse is True, then we should only have the values 'true' and 'false' in this
      Entry.  So, only update the configuration value if it is one of these two.  Otherwise, ignore it. '''
      spam = self.valueEntry.get()
      if spam in ['true','false']:
	self.configVal = spam
    else:
      '''this is not a variable which is limited to 'true' and 'false', so store the whole text'''
      self.configVal = self.valueEntry.get()

# get data from the file

fileName = "server.properties"
fileObject = open(fileName,'rb')
fileData = fileObject.read()
fileObject.close()

root = Tk()
configLines = []

for line in fileData.split('\n'):  # this splits it into individual lines
    if line.strip()=='':
      continue
    configLines.append(guiConfigItem(line=line))

# 1. create callbacks for each of the buttons,
def okClicked():
  '''Get the edited values and write them to the file then quit'''
  for c in configLines:
    c.update()
    print c.item2ConfigLine()
  # have updated and printed each line, now exit
  exit()

def cancelClicked():
  '''Cancel edits and quit'''
  exit()

# 2. create a frame for the buttons to go in
bottomFrame = Frame(root)

# 3. create the buttons, hooking up each of the buttons up to the callback
okWidget = Button(bottomFrame, text= "Ok", command = okClicked)
cancelWidget = Button(bottomFrame, text="Cancel", command = cancelClicked)

# 4. pack the buttons, then, finally,
okWidget.pack(side=LEFT)
cancelWidget.pack(side=RIGHT)
# 5. then pack the frame:
bottomFrame.pack(side=BOTTOM)

root.mainloop()

Exercise: run the code and confirm that: (a) your edits are captured and printed out; (b) if you enter anything but “true” or “false” for an item that takes only true and false, then the edit is ignored; and (c) that if you change a true to a false or vice versa, that that edit is captured.


Boss     (unfolding big map across table; talking carefully) Right … this is the plan then. … At 10:52, I shall approach the counter and purchase a watch costing £5.18.3d. I shall then give the watch to you, Vic. You’ll go straight to Norman’s Garage in East Street. You lads continue back up here at 10:56 and we rendezvous in the back room at the Cow and Sickle, at 11:15. All right, any questions?
Larry     We don’t seem to be doing anything illegal.
Boss     What do you mean?
Larry     Well … we’re paying for the watch.
Boss     (patiently) Yes…
Larry     (hesitating) Well… why are we paying for the watch?
Boss     (heavily) They wouldn’t give it to us if we didn’t pay for it, would they… eh?

This is our final instalment [sic] of our Minecraft config editor.   In the earlier tutorials we have done everything except actually updating the file.  Before we do update the file though, we need to make a backup of it, so it’s these two things that we’re going to do now.

All of the action will be in changing the behaviour of the ‘Ok’ button so that it makes a copy of server.properties into a new file called server.properties.bup and then writes the updated data to the server.properties file.  There is a small amount of work in making a copy of the file so we are going to do it in a separate function.  The function is not very smart.  It reads all of the data from the existing file and then just writes it out to the new file:

def backupFile(fileName):
  ''' Quick and dirty copy of file to fileName+".bup" - might also use os.rename(), but behaviour of os.rename is platform dependent '''
  fileObject = open(fileName,'rb')
  fileData = fileObject.read()
  fileObject.close()
  fileObject= open(fileName+".bup",'wb')  # will overwrite if it exists
  fileObject.write(fileData)
  fileObject.close()

The function could also use the rename() method from the os module.  Unfortunately, this behaves differently depending on the operating system you are using, so to keep it simple I have avoided it.

Exercise: Work out what this function is supposed to do, then confirm that it does it eg: start up a Python console, paste the definition in and then call the function with “server.properties” as a parameter.

Extra Points: If you are on a Unix based system, use the diff command to show the differences between the original and .bup (there shouldn’t be any).

With that done we can hook up the backup to the ‘Ok’ callback, and also write the new data:

def okClicked():
  '''Get the edited values and write them to the file then quit'''
  #TODO: add a confirmation dialog
  global fileName
  backupFile(fileName)
  dataToWrite = []
  for c in configLines:
    c.update()
    dataToWrite.append(c.item2ConfigLine())
  # have updated and printed each line, now exit
  fileObject = open(fileName,'wt')
  fileObject.write('\n'.join(dataToWrite)+'\n')
  # '\n' is technically not the newline character on Windows
  # but by default Python converts \n to the correct character on write
  # not sure if minecraft needs a final '\n', so included one just in case
  fileObject.close()
  exit()

Here we have introduced an array called dataToWrite.  Where, in the last tute we just printed out the line, in this tute we are appending those lines to the dataToWrite array.  Then, once they have been accumulated, we open the server.properties file (clearing it) then write our new data into it.  Only one generation of backup is saved.

We have used the global statement here to use the value of the fileName variable.  This is a little messy, but is a consequence of how the program has evolved.

One of the things that you might include here is a confirmation step.  Before the data gets overwritten we might ask the user to confirm that they are going to write over their data, giving them a second chance if they clicked “Ok” by mistake.

Complete code here:

# -*- coding: utf-8 -*-

'''Minecraft config editor:
This is an editor for the Minecraft server.properties file.  It:
* opens the file server.properties
* reads, then closes the file
* parses each line by
-- stripping leading and trailing whitespace
-- if the line starts with "#", marks it as a comment
-- splits the line into a key, value pair, with the pair separated by a "=" sign
-- if the value of the pair is either "true" or "false", the entry is marked as a boolean (ie its only values are either true or false)
* displays each key, value entry on the screen allowing you to edit it
* renames the server.properties file to server.properties.bup (overwriting any existing file of that name from earlier edits)
* opens a new file called server.properties
* writes each of the entries to that new file
* closes the server.properties file.
'''

from Tkinter import *


class configItem(object):  # name of the class, it is based on an object called 'object'
  def __init__(self, line):# this is called each time an instance of the class is created
    line = line.strip()  # this removes any white space at the start or end of the line
    # if it starts with a # it's a comment so check for it
    if line[:1] == "#":
       self.configKey = "#"
       self.configVal = line[1:]
    else:  # otherwise assume it's of the form x = y
       spam = line.split("=")
       self.configKey = spam[0]
       self.configVal = spam[1]
    # now check to see whether the config item takes only the values "true" and "false" 
    if self.configVal.lower() in ["true","false"]:
       self.isTrueFalse = True
    else:
       self.isTrueFalse = False
       
  def item2ConfigLine(self):
    if self.configKey=="#":
      '''If the key is '#' then this is a comment, so don't include an '=' sign'''
      return "%s%s"%(self.configKey, self.configVal)
    else:
      '''otherwise, it has the form key=value'''
      return "%s=%s"%(self.configKey,self.configVal)
      
   

class guiConfigItem(configItem):
  def __init__(self,line):
    super(guiConfigItem,self).__init__(line)  # run configItem's __init__ method
    self.frame = Frame()
    self.keyLabel = Label(self.frame, text = self.configKey)
    self.valueEntry = Entry(self.frame, width="60")
    self.valueEntry.insert("0",self.configVal)
    self.keyLabel.pack(side=LEFT )
    self.valueEntry.pack(side=RIGHT)
    self.frame.pack(side=TOP, fill="x")
 
  def update(self):
    ''' Get the value which is currently in the Entry widget and save it to configVal'''
    if self.isTrueFalse:
      '''if isTrueFalse is True, then we should only have the values 'true' and 'false' in this
      Entry.  So, only update the configuration value if it is one of these two.  Otherwise, ignore it. '''
      spam = self.valueEntry.get()
      if spam in ['true','false']:
	self.configVal = spam
    else:
      '''this is not a variable which is limited to 'true' and 'false', so store the whole text'''
      self.configVal = self.valueEntry.get()
    
    
# get data from the file

fileName = "ser_ver.properties"
fileObject = open(fileName,'rb')
fileData = fileObject.read()
fileObject.close()



root = Tk()
configLines = []

for line in fileData.split('\n'):  # this splits it into individual lines
    if line.strip()=='':
      continue
    configLines.append(guiConfigItem(line=line))

# 1. create callbacks for each of the buttons, 
def okClicked():
  '''Get the edited values and write them to the file then quit'''
  #TODO: add a confirmation dialog
  global fileName
  backupFile(fileName)
  dataToWrite = []
  for c in configLines:
    c.update()
    dataToWrite.append(c.item2ConfigLine())
  # have updated and printed each line, now exit
  fileObject = open(fileName,'wt')
  fileObject.write('\n'.join(dataToWrite)+'\n')
  # '\n' is technically not the newline character on Windows
  # but by default Python converts \n to the correct character on write
  # not sure if minecraft needs a final '\n', so included one just in case
  fileObject.close()
  exit()

def backupFile(fileName):
  ''' Quick and dirty copy of file to fileName+".bup" - might also use os.rename(), but behaviour of os.rename is platform dependent '''
  fileObject = open(fileName,'rb')
  fileData = fileObject.read()
  fileObject.close()
  fileObject= open(fileName+".bup",'wb')  # will overwrite if it exists
  fileObject.write(fileData)
  fileObject.close()

def cancelClicked():
  '''Cancel edits and quit'''
  exit()

# 2. create a frame for the buttons to go in
bottomFrame = Frame(root)

# 3. create the buttons, hooking up each of the buttons up to the callback
okWidget = Button(bottomFrame, text= "Ok", command = okClicked)
cancelWidget = Button(bottomFrame, text="Cancel", command = cancelClicked)

# 4. pack the buttons, then, finally,
okWidget.pack(side=LEFT)
cancelWidget.pack(side=RIGHT) 
# 5. then pack the frame:
bottomFrame.pack(side=BOTTOM)

root.mainloop()


Exercise: confirm that “Ok” saves your edits (open server.properties in a text editor or (extra points) write some Python to read and print the contents of the file) and that “Cancel” doesn’t.

Comments:

The code is a little messy because of how it has evolved in the course of explaining it.  Having code growing organically and getting messy is not unusual.  Every once in a while you need to stop and clean it up.  Cleaning it up can also allow you to restructure your code in ways you didn’t realise when you were writing it in the first place.

PS

My class names are naughty.  They should start with a capital letter.


Voice Over     This man is Ernest Scribbler… writer of jokes. In a few moments, he will have written the funniest joke in the world… and, as a consequence, he will die … laughing.
    Ernest stops writing, pauses to look at what he has written… a smile slowly spreads across his face, turning very, very slowly to uncontrolled hysterical laughter… he staggers to his feet and reels across room helpless with mounting mirth and eventually collapses and dies on the floor.

Summary: id(), copy, copy.copy(), copy.deepcopy()

A short tutorial this week on some oddities with the way Python stores and references (“binds to”) data.   You might remember, a long time ago, we talked about how, when we store data in a variable, it’s like putting your stuff in a bucket so that you can access it later.  Variables in Python actually turn out to be references to objects.   A side effect of this is that, in some cases, Python doesn’t work out how you think it will – typically this is where your object is a list or dictionary (actually any object, but you only notice this effect with compound objects) that you think you have copied, but you actually haven’t.

In particular, you can do this with ‘plain’ variables:

 >>> a = 5
 >>> b = a
 >>> a = 6
 >>> b
 5

You can also do this with lists:

 >>> c = [1,2]
 >>> d= c
 >>> c =[5,6]
 >>> d
 [1, 2]

But there’s a gotcha with lists where you change one of the list’s entries:

 >>> c= [1,2]
 >>> d = c
 >>> d
 [1, 2]
 >>> c[0]=3
 >>> d
 [3, 2]

Can you see that, even though we only changed the first entry in the list c (that is, c[0]), the first entry of d has also changed?  That’s because there is an underlying list object that both c and d are pointing to.  That is, they are both pointing to the same thing.  In a sense they are both windows to the same room (the list object).  Looking in either window allows you to “see” the changes made in the room.   You can see that the objects are the same because you can check their location in memory using the id() function which is built in to Python (try help(id)):

 >>> id(c)
 139636641421288
 >>> id(d)
 139636641421288
 >>> id(c) == id(d)
 True

The number (139636641421288) is where in the computer’s memory the object is stored.  It will change, probably each time you run the program.  If we assign a different list to d, it will have a different id, even though the values in the list are the same:

 >>> d = [3,2]       # note this new list has the same values as the old one
 >>> id(c) == id(d)
 False
 >>> id(d)
 139636640593824
 >>>

We can see that this other list is stored in a different location because the id() of the lists is different.  It turns out that this referencing behaviour is actually what you want to happen in most cases.  However, every so often you want your lists to be separate.  For that there is a special module called copy.  The copy module has a method (also called copy) which allows you to copy across the values of an object, rather than simply referencing (called “binding“) to an existing object:

>>> import copy
>>> d = copy.copy(c)
>>> d
[3, 2]
>>> c[0]=1
>>> d
[3, 2]
>>> c
[1, 2]

When you use copy.copy() the two objects will be separate and can be used independently.  Changes to one won’t show up in the other.  Where a compound object like a list or a dictionary has values which themselves are compound objects – for example a list where each entry in the list is itself a list – use the copy.deepcopy() method.   Depending on the complexity of your objects deepcopy() is not guaranteed to work (objects which refer to themselves somehow can cause a problem), but generally you will be fine.


19:54

Three years ago I wrote a post about my disappointment using SciPy with IronPython. A lot has changed since then, so I thought I’d write a short follow-up post.

To install NumPy and SciPy for use with IronPython, follow the instructions here. After installation, NumPy works as expected.

There is one small gotcha with SciPy. To use SciPy with IronPython, start ipy with the command line argument -X:Frames. Then you can use SciPy as you would from CPython. For example.

c:\> ipy -X:Frames
>>> import scipy as sp
>>> sp.pi
3.141592653589793

Without the -X:Frames option you’ll get an error when you try to import scipy.

AttributeError: 'module' object has no attribute '_getframe'

According to this page,

The issue is that SciPy makes use of the CPython API for inspecting the current stack frame which IronPython doesn’t enable by default because of a small runtime performance hit. You can turn on this functionality by passing the command line argument “-X:Frames” to on the command line.

A problem that sometimes comes up with source-controlled code is to find a revision in which some line was deleted, or otherwise modified in a way that blame can’t decipher. In other words, we want to grep over all revisions of some file to know which revisions contain a certain pattern. Note that the goal is not to search in the commit log (which is trivial), but rather in the code itself.

Well, if you’re using Mercurial or Git, you’re lucky because both provide built-in methods for doing this.

With Mercurial, use hg grep.

With Git, you can either use git grep in conjunction with git rev-list, or git log -S (more details in this SO thread).

What about Subversion, though? SVN, to the best of my knowledge, does not have this functionality built-in. Moreover, SVN’s design makes this task inherently slow because no revisions past the last one are actually kept on your machine (unless the repository is local) and you have to ask the server for each revision. That’s a lot of network traffic.

That said, if you’re willing to tolerate the slowness (and sometimes there’s no choice!), then the following script – svnrevgrep – makes it as simple as with Git or Mercurial:

import re, sys, subprocess

def run_command(cmd):
    """ Run shell command, return its stdout output.
    """
    return subprocess.check_output(cmd.split(), universal_newlines=True)

def svnrevgrep(filename, s):
    """ Go over all revisions of filename, checking if s can be found
        in them.
    """
    log = run_command('svn log ' + filename)
    for ver in re.findall('r\d+', log, flags=re.MULTILINE):
        cmd = 'svn cat -r %s %s' % (ver.rstrip('r'), filename)
        contents = run_command(cmd)
        print('%s: %s' % (ver, 'found' if re.search(s, contents)
                                       else 'not found'))
if __name__ == '__main__':
    if len(sys.argv) != 3:
        print('Usage: %s <path> <regex>' % sys.argv[0])
    else:
        svnrevgrep(sys.argv[1], sys.argv[2])

It basically goes over all revisions of the file starting with the most recent one and looks for the pattern.

Note that while one could imagine using some kind of binary searching to find the first revision in which the regex appears (or doesn’t), this won’t work in the general case because code sometimes is added, then deleted, then re-added, then deleted again (this happens when refactoring or when reverting problematic commits).

Finally, if you find yourself doing the above frequently for a given repository, you may be better off with:

git svn clone <path>
git grep <...>

Related posts:

  1. Python development switches to Mercurial source control The official CPython core development team has finally switched from...
  2. Migrating my personal projects to Mercurial Introduction My first acquaintance with version control was soon...

I was reading Hynek Schlawack’s excellent article on becoming a Python core developer and decided to find out just how hard it would be to get set up on my machine so that I could be ready to do core development myself, should I ever get the honor of being a part of the team. Since I run on Windows the most, I’m just going to talk about how I got set up for that OS. I’ve been thinking about trying to help with core development for a while anyway, so now’s as good a time as any. Let’s find out just how easy or hard the setting up process is!

What You’ll Need

To get up and running as a developer of Python on Windows, you’ll need a Mercurial client to download Python, update and create patches. You can use a command-line tool or you can get TortoiseHg, a shell GUI. Once you have that configured correctly, you can do a


hg clone http://hg.python.org/cpython

Or use Tortoise to check out the repository. This will get you the latest Python 3.x version. If you want to help with a maintenance release, then you’ll want to read the documentation. The last major tool you’ll need is a compiler and the one that’s needed for the latest Python is Microsoft Visual Studio 2010. Fortunately, you don’t need to purchase the whole thing. In fact, you can just get the Express version of Visual C++ 2010 and you’ll be good to go. Note that this tool is not lightweight and ends up taking over one gigabyte of space on disc. Sad, but true.

Compiling Python on Windows 7

As you may have guessed, we’ll be doing the compiling of Python on Windows 7. I have Windows XP too, but that OS is practically dead now, so I’m not going to cover it. I doubt it’s much different anyway. Regardless, according to the dev guide documentation, you need to go into the repo that you just created on your machine and go into the PCBuild folder. Then find the pcbuild.sln file and run it with your new Visual Studio C++ application. You may see a warning from it about how the Express version doesn’t support Solutions, but just ignore that. Once the project is loaded, go into the Debug menu and select Build Solution. Oddly enough, the official docs say to go into the Build menu, NOT the Debug menu, but my copy doesn’t have a Build menu to choose from.

When I ran the build I got the following result at the end:


========== Build: 20 succeeded, 8 failed, 0 up-to-date, 3 skipped ==========

Looking through the log, it looks like it was unable to find the header files for sqlite3 and tcl and it had some issues with the bzip lib. It also complained that I don’t have ActivePerl / openssh installed. However, it still compiled Python and I had a fresh python_d.exe file in my PCBuild folder that I could run. I ran it by double-clicking it, but you can also run it from within Visual Studio using F5 or by going to the Debug menu and clicking Start Debugging.

I think at this point, I’m ready to figure out how to create a patch. If I do, I may try patching that messed up documentation so that people won’t spend time looking for a non-existent menu. Then I’ll write an article about how to submit a patch and use the issue tracker system for Python.

Related Reading

10:54

On May 12th, 2012, over 50 Python, C++, Ruby, PHP, JavaScript, and Node.js developers arrived to code on a variety of projects. It was awesome! Tons of open source projects saw contributions, and people across languages and frameworks worked together.

http://farm9.staticflickr.com/8007/7193954598_1b071cb5e4.jpg

Event Background

Less then two weeks before May 12, a bunch of us Los Angeles area Python developers were hanging out and wishing we had a local sprint to attend that was just about developers working on open source projects. It was then that Audrey Roy and I, along with an army of hardworking volunteers, decided to stop wishing and make it happen on May 12th.

We lined up a venue, contacted awesome sponsors Spire.io, Heroku, Github, Cars.com, and Cartwheel Academy. As we did that, we also invited people from the many Los Angeles programming communities in Los Angeles to join us. The result of everyone's hard work? We filled up all sixty spots in less than 96 hours!

Some of the projects worked on included:

  • Salt Stack: https://github.com/saltstack/salt
  • A node.js-powered streaming terminal, allowing for shared input at a terminal among several participants.
  • A JavaScript powered astrolabe.
  • Settlers of Catan analytics in JavaScript.
  • OpenFrameworks, a cross-platform toolkit for creative coding in C++.
  • My own time at the sprint was spent with Audrey Roy and Randall Degges on engineering cleanup and fixing bugs on OpenComparison.

More open source sprinting on July 15

There's going to be another Los Angeles open source event on July 15 at Originate. Instead of less then two weeks to plan, we have nearly two months - so it's going to be better!

RSVP here: http://www.meetup.com/LA-Hackathons/events/64542582/

If you want to sponsor or volunteer, email me at pydanny (at) cartwheelweb.com or audreyr (at) cartwheelweb.com. We go out of our way to ensure that sponsors and volunteers feel appreciated.

http://farm9.staticflickr.com/8003/7193961164_b26d27093d.jpg

The other day, I thought it would be fun to create a little program that could generate QR codes and show them onscreen with wxPython. Of course, I wanted to do it all with Python, so after a little looking, I came across 3 candidates:

I tried python-qrcode and pyqrnative since they worked on Windows as well as Mac and Linux. They also didn’t require anything except the Python Imaging Library. The pyqrcode project requires several other prerequisites and didn’t work on Windows, so I didn’t want to mess with it. I ended up taking some old code based on my Photo Viewer application and modified it slightly to make this a QR Code viewer. If I’ve piqued your interest, then read on!

Getting Started

As I noted at the beginning, you will need the Python Imaging Library. We’ll be using wxPython for the GUI part, so you’ll need that as well. And you’ll want to download python-qrcode and pyqrnative. The main difference I have found is that python-qrcode is much faster at generating the images and it generates the type you’ve probably seen the most of. For some reason, pyqrnative take a lot longer to run and it creates a much denser looking QR code. There may be options for both of these projects that allow you to generate different kinds of codes, but the documentation for either project is abysmal. I ended up using the source and Wingware’s IDE to traverse the code more than anything else.

Generating QR Codes

Anyway, once you have all the prerequisites, you can run the following code and see what Python can do:

import os
import wx
 
try:
    import qrcode
except ImportError:
    qrcode = None
 
try:
    import PyQRNative
except ImportError:
    PyQRNative = None
 
########################################################################
class QRPanel(wx.Panel):
    """"""
 
    #----------------------------------------------------------------------
    def __init__(self, parent):
        """Constructor"""
        wx.Panel.__init__(self, parent=parent)
        self.photo_max_size = 240
        sp = wx.StandardPaths.Get()
        self.defaultLocation = sp.GetDocumentsDir()
 
        img = wx.EmptyImage(240,240)
        self.imageCtrl = wx.StaticBitmap(self, wx.ID_ANY,
                                         wx.BitmapFromImage(img))
 
        qrDataLbl = wx.StaticText(self, label="Text to turn into QR Code:")
        self.qrDataTxt = wx.TextCtrl(self, value="http://www.mousevspython.com", size=(200,-1))
        instructions = "Name QR image file"
        instructLbl = wx.StaticText(self, label=instructions)
        self.qrPhotoTxt = wx.TextCtrl(self, size=(200,-1))
        browseBtn = wx.Button(self, label='Change Save Location')
        browseBtn.Bind(wx.EVT_BUTTON, self.onBrowse)
        defLbl = "Default save location: " + self.defaultLocation
        self.defaultLocationLbl = wx.StaticText(self, label=defLbl)
 
        qrcodeBtn = wx.Button(self, label="Create QR with qrcode")
        qrcodeBtn.Bind(wx.EVT_BUTTON, self.onUseQrcode)
        pyQRNativeBtn = wx.Button(self, label="Create QR with PyQRNative")
        pyQRNativeBtn.Bind(wx.EVT_BUTTON, self.onUsePyQR)
 
        # Create sizer
        self.mainSizer = wx.BoxSizer(wx.VERTICAL)
        qrDataSizer = wx.BoxSizer(wx.HORIZONTAL)
        locationSizer = wx.BoxSizer(wx.HORIZONTAL)
        qrBtnSizer = wx.BoxSizer(wx.VERTICAL)
 
        qrDataSizer.Add(qrDataLbl, 0, wx.ALL, 5)
        qrDataSizer.Add(self.qrDataTxt, 1, wx.ALL|wx.EXPAND, 5)
        self.mainSizer.Add(wx.StaticLine(self, wx.ID_ANY),
                           0, wx.ALL|wx.EXPAND, 5)
        self.mainSizer.Add(qrDataSizer, 0, wx.EXPAND)
        self.mainSizer.Add(self.imageCtrl, 0, wx.ALL, 5)
        locationSizer.Add(instructLbl, 0, wx.ALL, 5)
        locationSizer.Add(self.qrPhotoTxt, 0, wx.ALL, 5)
        locationSizer.Add(browseBtn, 0, wx.ALL, 5)
        self.mainSizer.Add(locationSizer, 0, wx.ALL, 5)
        self.mainSizer.Add(self.defaultLocationLbl, 0, wx.ALL, 5)
 
        qrBtnSizer.Add(qrcodeBtn, 0, wx.ALL, 5)
        qrBtnSizer.Add(pyQRNativeBtn, 0, wx.ALL, 5)
        self.mainSizer.Add(qrBtnSizer, 0, wx.ALL|wx.CENTER, 10)
 
        self.SetSizer(self.mainSizer)
        self.Layout()
 
    #----------------------------------------------------------------------
    def onBrowse(self, event):
        """"""
        dlg = wx.DirDialog(self, "Choose a directory:",
                           style=wx.DD_DEFAULT_STYLE)
        if dlg.ShowModal() == wx.ID_OK:
            path = dlg.GetPath()
            self.defaultLocation = path
            self.defaultLocationLbl.SetLabel("Save location: %s" % path)
        dlg.Destroy()
 
    #----------------------------------------------------------------------
    def onUseQrcode(self, event):
        """

https://github.com/lincolnloop/python-qrcode

        """
        qr = qrcode.QRCode(version=1, box_size=10, border=4)
        qr.add_data(self.qrDataTxt.GetValue())
        qr.make(fit=True)
        x = qr.make_image()
 
        qr_file = os.path.join(self.defaultLocation, self.qrPhotoTxt.GetValue() + ".jpg")
        img_file = open(qr_file, 'wb')
        x.save(img_file, 'JPEG')
        img_file.close()
        self.showQRCode(qr_file)
 
    #----------------------------------------------------------------------
    def onUsePyQR(self, event):
        """

http://code.google.com/p/pyqrnative/

        """
        qr = PyQRNative.QRCode(20, PyQRNative.QRErrorCorrectLevel.L)
        qr.addData(self.qrDataTxt.GetValue())
        qr.make()
        im = qr.makeImage()
 
        qr_file = os.path.join(self.defaultLocation, self.qrPhotoTxt.GetValue() + ".jpg")
        img_file = open(qr_file, 'wb')
        im.save(img_file, 'JPEG')
        img_file.close()
        self.showQRCode(qr_file)
 
    #----------------------------------------------------------------------
    def showQRCode(self, filepath):
        """"""
        img = wx.Image(filepath, wx.BITMAP_TYPE_ANY)
        # scale the image, preserving the aspect ratio
        W = img.GetWidth()
        H = img.GetHeight()
        if W > H:
            NewW = self.photo_max_size
            NewH = self.photo_max_size * H / W
        else:
            NewH = self.photo_max_size
            NewW = self.photo_max_size * W / H
        img = img.Scale(NewW,NewH)
 
        self.imageCtrl.SetBitmap(wx.BitmapFromImage(img))
        self.Refresh()
 
 
########################################################################
class QRFrame(wx.Frame):
    """"""
 
    #----------------------------------------------------------------------
    def __init__(self):
        """Constructor"""
        wx.Frame.__init__(self, None, title="QR Code Viewer", size=(550,500))
        panel = QRPanel(self)
 
if __name__ == "__main__":
    app = wx.App(False)
    frame = QRFrame()
    frame.Show()
    app.MainLoop()

The code for changing and showing the picture is explained in the previous article I wrote (and linked to above), so the only parts that you’ll probably care about are the two methods for generating the QR codes: onUseQrcode and onUsePyQR. I just took some examples from their respective websites and modified them slightly to create the QR code images. They’re very straight-forward, but not well documented, so I can’t really tell you what’s going on. Sadly at the time of this writing, the code for these projects is seriously lacking in docstrings, with only a few here and there. Still, I was able to generate some decent QR codes. The following was done using python-qrcode:

As you can see, it’s a pretty standard code. The next one is created with PyQRNative and is much denser looking:

I tried scanning both images with my Android cell phone’s barcode scanning application and both QR codes were read correctly by it. So if you’re in need of generating QR code images for your project, I hope one of these projects will fit your needs!

UPDATE 5/21/2012

One of my readers (Mike Farmer) contacted me recently about his experiments with PyQRNative and told me that the “first argument is container size and the second is redundancy/error
correction”. I kind of guessed what the second one was, but I don’t really know what the error correction levels do. Fortunately, Mr. Farmer explained it to me: If the error correction is low it will not be able to tolerate smears or tears to the tag without failing to read. But if you crank up the error level it will obviously get bigger qrcode but what you have done is create duplicate data inside the tag. So if the tag is smeared or torn it can still read and recover the remaining data. So if your application is creating tags that can be damaged its wise to crank up the error correction. You can also do cool thing with this like superimpose pictures or text on the tag by cranking error correction up so the data is redundant and then it can tolerate the “damage”. Anyway, if you change the first number, you can grow or shrink the QR code image size. Why would you do that? Well, the more information you need to store in the image, the bigger the image will need to be. Mr. Farmer came up with some fun test code to help him figure out exactly what the minimum size a QR code has to be. I am reproducing the code below:

import PyQRNative
 
def makeQR(data_string,path,level=1):
    quality={1: PyQRNative.QRErrorCorrectLevel.L,
             2: PyQRNative.QRErrorCorrectLevel.M,
             3: PyQRNative.QRErrorCorrectLevel.Q,
             4: PyQRNative.QRErrorCorrectLevel.H}
    size=3
    while 1:
        try:
            q = PyQRNative.QRCode(size,quality[level])
            q.addData(data_string)
            q.make()
            im=q.makeImage()
            im.save(path,format="png")
            break
        except TypeError:
            size+=1

Source Code

What follows is an account of how I found and fixed an insidious bug in Stackless Python which has been there for years.  It’s one of those war stories.  Perhaps a bit long winded and technical and full of exaggerations as such stories tend to be.

Background

Some weeks ago, because of a problem in the client library we are using, I had to switch the http library we are using on the PS3 from using non-blocking IO to blocking. Previously, we were were issuing all the non-blocking calls, the “select” and the tasklet blocking / scheduling on the main thread. This is similar to how gevent and other such libraries do things. Switching to blocking calls, however, meant doing things on worker threads.

The approach we took was to implement a small pool of pyton workers which could execute arbitrary jobs. A new utility function, stacklesslib.util.call_async() then performed the asynchronous call by dispatching it to a worker thread. The idea of an call_async() is to have a different tasklet execute the callable while the caller blocks on a channel. The return value, or error, is then propagated to the originating tasklet using that channel. Stackless channels can be used to communicate between threads too. And synchronizing threads in stackless is even more conveninent than regular Python because there is stackless.atomic, which not only prevents involuntary scheduling of tasklets, it also prevents automatic yielding of the GIL (cPython folks, take note!)

This worked well, and has been running for some time. The drawback to this approach, of course, is that we now need to keep python threads around, consuming stack space. And Python needs a lot of stack.

The problem

The only problem was, that there appeared to be a bug present. One of our developers complained that sometimes, during long downloads, the http download function would return None, rather than the expected string chunk.

Now, this problem was hard to reproduce. It required a specific setup and geolocation was also an issue. This developer is in California, using servers in London. Hence, there ensued a somewhat prolonged interaction (hindered by badly overlapping time-zones) where I would provide him with modified .py files with instrumentation, and he would provide me with logs. We quickly determined, to my dismay, that apparently, sometimes a string was turning into None, while in transit trough a channel.send() to a channel.receive(). This was most distressing. Particularly because the channel in question was transporting data between threads and this particular functionality of stackless has not been as heavily used as the rest.

Tracking it down

So, I suspected a race condition of some sorts. But a careful review of the channel code and the scheduling code presented no obvious candidates. Also, the somehwat unpopular GIL was being used throughout, which if done correctly ensures that things work as expected.

To cut a long story short, by a lucky coincidence I managed to reproduce a different manifestation of the problem. In some cases, a simple interaction with a local HTTP server would cause this to happen.

When a channel sends data between tasklets, it is temporarily stored on the target tasklet’s “tempval” attribute. When the target wakes up, this is then taken and returned as the result from the “receive()” call. I was able to establish that after sending the data, the target tasklet did indeed hold the correct string value in its “tempval” attribute. I then needed to find out where and why it was disappearing from that place.

By adding instrumentation code to the stackless core, I established that this was happening in the last line of the following snippet:

PyObject *
slp_run_tasklet(void)
{
    PyThreadState *ts = PyThreadState_GET();
    PyObject *retval;

    if ( (ts->st.main == NULL) && initialize_main_and_current()) {
        ts->frame = NULL;
        return NULL;
    }

    TASKLET_CLAIMVAL(ts->st.current, &retval);

By setting a breakpoint, I was able to see that I was in the top level part of the “continue” bit of the “stack spilling” code

Stack spilling is a feature of stackless where the stack slicing mechanism is used to recycle a deep callstack. When it detects that the stack has grown beyond a certain limit, it is stored away, and a hard switch is done to the top again, where it continues its downwards crawl. This can help conserve stack address space, particularly on threads where the stack cannot grow dynamically.

So, something wrong with stack spilling, then.  But even so, this was unexpected. Why was stack spilling happening when data was being transmitted across a channel? Stack spilling normally occurs only when nesting regular .py code and other such things.

By setting a breakpoint at the right place, where the stack spilling code was being invoked, I finally arrived at this callstack:

Type Function
PyObject* slp_eval_frame_newstack(PyFrameObject* f, int exc, PyObject* retval)
PyObject* PyEval_EvalFrameEx_slp(PyFrameObject* f, int throwflag, PyObject* retval)
PyObject* slp_frame_dispatch(PyFrameObject* f, PyFrameObject* stopframe, int exc, PyObject* retval)
PyObject* PyEval_EvalCodeEx(PyCodeObject* co, PyObject* globals, PyObject* locals, PyObject** args, int argcount, PyObject** kws, int kwcount, PyObject** defs, int defcount, PyObject* closure)
PyObject* function_call(PyObject* func, PyObject* arg, PyObject* kw)
PyObject* PyObject_Call(PyObject* func, PyObject* arg, PyObject* kw)
PyObject* PyObject_CallFunctionObjArgs(PyObject* callable)
void PyObject_ClearWeakRefs(PyObject* object)
void tasklet_dealloc(PyTaskletObject* t)
void subtype_dealloc(PyObject* self)
int slp_transfer(PyCStackObject** cstprev, PyCStackObject* cst, PyTaskletObject* prev)
PyObject* slp_schedule_task(PyTaskletObject* prev, PyTaskletObject* next, int stackless, int* did_switch)
PyObject* generic_channel_action(PyChannelObject* self, PyObject* arg, int dir, int stackless)
PyObject* impl_channel_receive(PyChannelObject* self)
PyObject* call_function(PyObject*** pp_stack, int oparg)

Notice the “subtype_dealloc”. This callstack indicates that in the channel receive code, after the hard switch back to the target tasklet, a Py_DECREF was causing side effects, which again caused stack spilling to occur. The place was this, in slp_transfer():

/* release any objects that needed to wait until after the switch. */
Py_CLEAR(ts->st.del_post_switch);

This is code that does cleanup after tasklet switch, such as releasing the last remaining reference of the previous tasklet.

So, the bug was clear then. It was twofold:

  1. A Py_CLEAR() after switching was not careful enough to store the current tasklet’s “tempval” out of harms way of any side-effects a Py_DECREF() might cause, and
  2. Stack slicing itself, when it happened, clobbered the current tasklet’s “tempval”

The bug was subsequently fixed by repairing stack spilling and spiriting “tempval” away during the Py_CLEAR() call.

Post mortem

The inter-thread communication turned out to be a red herring. The problem was caused by an unfortunate juxtaposition of channel communication, tasklet deletion, and stack spilling.
But why had we not seen this before? I think it is largely due to the fact that stack spilling only rarely comes into play on regular platforms. On the PS3, we deliberately set the threshold low to conserve memory space. This is also not the first stack-spilling related bug we have seen on the PS3, but the first one for two years. Hopefully it will be the last.

Since this morning, the fix is in the stackless repository at http://hg.python.org/stackless

I'm now back from Copenhagen were I attended the mercurial 2.3 sprint with twenty other people. A huge amount of work was done in a very friendly atmosphere.

Regarding mercurial's core:

  • Bookmark behaviour was improved to get closer to named branch's behaviour.
  • Several performance improvements regarding branches and heads caches. The heads cache refactoring improves rebase performance on huge repository (thanks to Facebook and Atlassian).
  • The concept I'm working on, Obsolete markers, was a highly discussed subject and is expected to get partly into the core in the near future. Thanks to my employer Logilab for paying me to work on this topic.
  • General code cleanup and lock validation.
http://www.logilab.org/file/92956?vid=download

Regarding the bundled extension :

  • Some fixes where made to progress which is now closer to getting into mercurial's core.
  • Histedit and keyring extensions are scheduled to be shipped with mercurial.
  • Some old and unmaintained extensions (children, hgtk) are now deprecated.
  • The LargeFile extension got some new features (thanks to the folks from Unity3D)
  • Rebase will use the --detach flag by default in the next release.
http://www.logilab.org/file/92958?vid=download

Regarding the project itself:

http://www.logilab.org/file/92955?vid=download

Regarding other extensions:

http://www.logilab.org/file/92959?vid=download

And I'm probably forgetting some stuff. Special thanks to Unity3D for hosting the sprint and providing power, network and food during these 3 days.

I developed software to find a maximum common subgraph (MCS) given a set of molecules represented as a chemical graph. It's called fmcs. My previous three essays were about the background of the MCS problem, introducing fmcs, and an example of when MCS is used.

What I didn't describe was the mental effort it took to develop this program. This is the second time I've written code to find the multiple structure MCS, and both times it took a couple of months and put my head in a very strange place. You would think the second time is easier, but it means that I spent more time adding features and doing things that my first version couldn't begin to handle. (Did someone just mutter "second system effect"? Pshaw!)

This time too I had a better understanding of the development process. I think I know why it's so much harder than most of the software I develop.

In this essay, I reflect on some of the reasons why this is a hard problem to test and I consider how unit tests, or any other incremental-based testing approach, are not well-suited to a certain class of complex algorithm development. While unit tests can provide a basic sanity check during development, they fail to provide the test coverage which one might expect from applying them to algorithmically simpler modules. Further, development methodologies built primarily on unit tests, like test-first style test-driven development, aren't that applicable. Instead, other methods, like system testing and a deep understanding of the algorithm and possible problems, are required. This strengthens my view, explored in Problems with TDD that "TDD is a weak method for developing those tests of confidence."

How would you implement an MCS algorithm?

Think about the problem for a moment. How would you write an algorithm to find the largest common subgraph between two graphs?

Take your time. This really is a hard problem. I dug up some of the early papers from the Journal of Computer Documentation. People ended up simplifying the problem by, for example, not supporting rings.

Finished thinking?

You probably came up with a graph walking algorithm which tries different match pairings and uses backtracking or lazy evaluation to search all of graph space. The more mathemtically inlined might have converted this into a maximum clique algorithm.

In both cases there are a lot of arbitrary decisions to make. Which pairings should you investigate first? Are there times when you cann prune the search space? Did you make any assumptions about the nature of the chemical graph (eg, that it's topologically a planar graph)?

How would you test your MCS algorithm?

Back in the late 1990s, I almost never wrote automated tests, and never wrote an extensive test suite. Nowadays I'm pretty thorough, and use coverage-guided techniques to get good, extensive tests through some semi-permanent API layer that will allow me to refactor the internals without changing the tests.

I couldn't do that here.

There are a lot of heuristics, and some of them are only triggered under unusual circumstances, after a bunch of combinitorial possibilities. Outside of a few minor components, I couldn't figure out how to write unit tests for the code.

I ended up pushing most of the testing into validation testing of the complete system, which meant I wrote some 1,000+ lines of code without strong testing. Moreover, I used a new approach to the MCS problem, so the algorithm I was working on doesn't have the track record of the standard clique-based or backtracking approaches.

So stress factors included not knowing if the algorithm would work, and not being able to develop enough test cases to provide good validation during development.

As a rule of thumb, it's easiest to fix bugs which are caught early. Unit tests and evolutionary prototyping are two of the techniques that people use to tighten the feedback loop between specification, implementation, and valiation. I think another stress factor is propotional to the size of the feedback loop.

What testing did I do?

I mean, I did have tests during development. I came up with a few examples by hand, I did a substructure search of a large data set and I verified that the MCS code found that substructure, but I know that's not enough tests. I know this because after six weeks of development and over 1000 lines of code, I spent another three weeks doing a large amount of post-development testing, and found several bugs. In the process of writing this essay I also found that four days of that development work ended up making things slower, so I'll have to remove it.

I did most of my tests based on the ChEMBL data: 10,000 random pairs of structures, the k=2, k=10, and k=100 nearest neighbors, and the k<=100 neighbors with similarities of at least 0.95, 0.9, and 0.8. I also did, at the end, tests based on the ChEBI structure ontology. There were easily 20,000 different machine-generated test cases, although in most cases I didn't know the expected results beforehand.

What bugs did I find?

What bugs did I find? I think it's educational to characterize a few of them.

Typo caught by an assertion check

One of the simplest bugs was a poorly formatted string. I used "%d" when I should have used ""%%%d". A bit of jargon for those in the know; I generate SMARTS strings for the intermedate subgraphs. If there are more than 9 open rings then the syntax goes from a single digit closure number to a closure number like "%10". I forgot to include the '%' for that case, and probably because the '%' was already there for the format string.

This wasn't triggered by my random-pairs test nor my various similarity-search based tests. Only when I ran through the ChEBI data, did I get an assertion failure when RDKit refused to accept my SMARTS string. That was the first time where I had a SMARTS with 10 unclosed rings.

As it happens, this error could have been caught by a unit testing, as some people practice unit tests. It's a four line function which takes a string and a number. Testing it is trivial. I didn't test it because I feel that testing directly against internal functions inhibits refactoring. I prefer to test against higher-level, more stable APIs.

My view is that it's usually easy to come up with high-level test cases which trigger a certain function call. But not in this case. The MCS search algorithm, while deterministic, uses so many arbitrarily defined values that I couldn't on my own come up with a test case with at least 10 open ring closures. And even if I did, a new search heuristic might change things so that only, say, 7 open ring closures were needed.

I felt that my system testing and the assertion check would be enough to identify if there was a problem, and it did. A low-level unit test might have helped, especially as I still don't have a specific test for that failure.

I think the right thing to do is add that failure as a specific test case, and use monkey-patching to insert wrong code for a repeat of that test case. The first one tests that the code is correct, and the second tests that the test case is still exercising the code under test.

Cross-validation testing

The first MCS papers are about as old as I am. Many people have written implementions, although relatively few are are available to me both at no cost and with no prohibitions on using it to develop a new MCS algorithm. (Some commercial companies don't like people using their software to write new software which is competitive to it, or even to use their software for benchmark comparisons.)

I tested pairs of structures against SMSD and I did more extenstive tests against Indigo's MCS implementation.

This is cross-validation testing. It's a relatively rare technique because the cost of producing multiple identical implementations usually isn't worth the benefits. Even here the results aren't exactly identical because of differences in how the toolkits perceive chemistry, and more specifically, aromaticity. I ended up spending a lot of investigation time staring at cases with different answers and trying to figure out if it was a chemistry problem or an MCS algorithm implementation problem.

I found the SMSD had a bug in one of its options, which I reported back to the author. The code had been fixed internally but not pushed to the outside world. Its default mode and fmcs matched quite well, except for a couple of chemistry differences. The new version is out now - I need to test it again.

The only problem I found in the Indigo code was a part of their setup code which didn't check the timeout. That's also been fixed after I reported it.

The cross-validation with Indigo found problems in my code. For example, I was often getting smaller MCSes then Indigo. After looking at them, I figured out that my code didn't correctly handle the case when a molecule was fragmented after a bond was removed because its bondtype wasn't in all of the structures.

Why didn't my hand-written test cases find it? None of them had a case where there was a bond in the "middle" of a structure chosen as the reference structure, and where the MCS was not in the first fragment.

My code usually got the right answer when using highly similar structures, for obvious reasons. It was only the random pair testing where the problem really stood out.

Could I create a simple unit test for this error? Perhaps, but it's not easy. I don't know which of the two fragments will be first - it depends on so many arbitrary decisions which could change as the algorithm is improved. The only test I can think of for this was to generate a diverse set of tests, make sure some fail if the code isn't implemented correctly, record the results, and make sure that future tests never find a worse (or better?) test case.

Bad heuristic to determine the maximum possible subgraph growth

Most of the MCS implementation is heuristics. There's a branch and bound search, there's subgraph canonicalization to reduce the number of substructure tests, and so on. Each of these is supposed to help make the code faster.

One of the tests takes the current subgraph and the list of "outgoing" bonds to see how much of the remainder of the graph is accessible for growth from the current subgraph. The rest of the molecule might not be accessible because it's on another fragment, but it also might not be accessible due to an earlier decision to exclude certain parts of the molecule from future growth. (My algorithm tests all subgraphs which include a given bond, then all subgraph which don't include the given bond.)

It took a couple of days to think of the algorithm, write the code, and get it working. I then did the timing tests to find out it was 1% faster, to get the same answers.

Does that mean my code worked? I only had a suspicion that the new algorithm should be faster. Perhaps the overhead of searching for the accessible atoms was too costly?

After looking at my implementation - a lot - I finally realized that I told the algorithm to exclude the subgraph from consideration for growth but had forgotten to also exclude the set of previously excluded bonds from consideration. With two changed lines, the overall performance doubled for the random-pairs case. Most of the time it's faster and sometimes its slower, so it takes an aggregate of tests to measure this correctly.

I don't think this heuristic could have been written as a unit test. No, I take that back. It could have, but it would have required some careful thought to set up. Not only was this part of the code not "built for test", but setting up the right conditions requires a mental understanding of the problem which I know I didn't have when I was doing the development.

BTW, I am not saying that unit tests can't be used to measure performance. Some algorithms have simple timing characteristics where it's easy to say that the code must complete by a given time, or that it must be at least twice as fast as a reference implementation. Occasionally these tests will hiccup due to unusual loads on the test machine, but not usually enough to be a problem.

Indeed, the fmcs code has some unit tests for the timeout code. I found a pair of structures which takes over 10 seconds to find the MCS, I set the timeout to 0.1 second, and assert that no more than 0.5 seconds elapsed during the function call. (I don't require a high precision for this code.) I do worry though that future changes to the MCS search code might speed things up by a faster of 20. On the other hand, the effect of broken timeout code is very obvious when running the validation suite, so I'm not going to worry about it.

As is widely acknowledged, this stretches the idea of a "unit test." Unit tests are supposed to be fast; preferably several hundred or more per second. The goal is that these tests run often, perhaps even after every save. Stick in a few 0.1 second timeouts into the system and it bogs everything down, which discourages people from running the tests so often.

But let's go back to this new code. On average, over a large number of tests, the performance is 50% faster. What's a good test case? Can I pick one structure or a small set of structures? Does the speedup occur only when there are large molecules? Only for molecules with lots of internal symmetry? Only for those which are easily fragmented?

Only now, when writing this essay, did I find good test cases. When I use a 10 second timeout using the k=10 nearest neighbors tests, I get 8 timeouts in the first 32 records using the old algorithm, and only 1 timeout using the new algorithm. The time for the very first test goes from over a minute (I finally gave up) using the old algorithm to 0.08 seconds using the new one.

Obviously (in retrospect) the right solution is to pull out a couple of those from my validation suite and turn them into test cases.

I never would have come up with these test cases before implementing the new code - up until ten minutes ago I thought that the new code only added a factor of two the code, not possible factors of 1,000!

Canonicalization: an in-depth analysis

I'm not finished with testing. I haven't fully characterized all the parts of the implementation. For that matter, there are heuristics I can still add. Here I'll describe what I did to analyze subgraph SMARTS canonicalization. I conclude by finding that while it works, it slows down the code and several days of development work ought to be removed.

My MCS algorithm enumerates subgraph of one structure, converts that into a SMARTS string, and tests if the pattern exists in other structures.

There are many alternate ways to make a SMARTS string from a subgraph. Given the linear chain of C O N, there's the reasonable CON or NOC variations, the more confusing O(C)N and O(N)C, and crazy variations like C1.N1O and O%987.N7.C%98. It's easy to make the non-insane versions be generating a spanning tree. The question is where to start the search and how to handle branches.

I believe that there are many duplicate patterns in a query. For example, a benzene ring will generate many "ccc" queries. I can minimize the number of substructure matches by caching earlier SMARTS match results. But caching doesn't work if sometimes I get a CON and sometimes I get a NOC. The solution is to generate a "canonical" SMARTS for the subgraph - a unique representation for all of the variations.

I implemented the CANGEN algorithm from SMILES. 2. Algorithm for generation of unique SMILES notation by Weininger, Weininger, and Weininger J. Chem. Inf. Comput. Sci., 1989, 29 (2), pp 97-101 in order to choose a canonical SMARTS. (But see Assigning Unique Keys to Chemical Compounds for Data Integration: Some Interesting Counter Examples, Neglur, Grossman, and Liu, 2nd International Workshop on Data Integration in the Life Sciences (DILS 2005), La Jolla.)

Canonicalization is notoriously hard to get right. It too has a lot of tricky corner cases. For example, in the late 1990s Daylight tracked down a reported bug in their canonicalization implementation. They algorithm requires a stable sort algorithm, but they used an unstable sort. Their own internal testing never found the problem - it was a customer who found the Solaris and IRIX boxes occasionaly returned different values.

There could be similar bugs in my canonicalization algorithm. I don't know. The usual solution is to generate large numbers of test cases, randomize the order of the atoms and bonds, and verify that all of them are the same. Here too it may be that only certain rare topologies, or rare bond patterns, triggers an error. I'm not convinced that I could come up with the right set of test cases for it.

What makes it harder is that the underlying search algorithm doesn't need a canonical SMARTS, so a canonicalization failure doesn't ever show up in the output. I could have an infrequent bug and not notice it! Canonicalization is only there for performance reasons, but my implementation is in Python, while the substructure matching is in C++, so it might even be that the canonicalization time might be more than the time saved.

I can make that a bit more nuanced. Canonical SMARTS generation has three phases: 1) initial ranking, 2) canonicalization/tie-breaking, and 3) SMARTS generation. I can disable the canonicalization step and get a "semi-canonical" SMARTS (my initial ranking is more like a circular fingerprint of diameter 3 than the atom characteristic from Weininger et al.). I can also disable caching. Both of these are doable with only a couple of changes to the code.

This leads to three cases: canonical SMARTS with caching, semi-canonical SMARTS with caching, and semi-canonical SMARTS without caching. (There should be a fourth step, which is arbitrary SMARTS, but that requires more extensive changes.)

What I want to understand most is the effect of canonicalization on my code. Again, I have to resort to aggregate timings and statistics across a set of benchmarks. I used the ChEMBL-13 data set and generated various benchmarks. One is based on computing the MCS between 10,000 pairs selected at random, another contains the k=2, k=10, and k=100 nearest neighbors of randomly chosen fingerprints (timing 500 data sets each time), and the last is the total of 500 tests of the k<=100 compounds with Tanimoto score of at least 0.95 to randomly selected fingerprints. My results (times are reproducible to within a few percent) include the number of unique SMARTS (canonical or non-canonical) generated and the total number of substructure tests ("SS") which were carried out.

Random pair timings# unique SMARTS# SS tests
no cacheTotal: 10000/467.3s (21.4/s) Complete: 9997/437.3s (22.9/s) Incomplete: 3/30.0s9216661339292
semi-canonicalTotal: 10000/421.3s (23.7/s) Complete: 9997/391.3s (25.6/s) Incomplete: 3/30.0s921666921666
canonicalTotal: 10000/442.1s (22.6/s) Complete: 9997/412.0s (24.3/s) Incomplete: 3/30.0s709636709636
k=2 nearest neigbhors# unique SMARTS# SS tests
no cacheTotal: 500/287.3s (1.7/s) Complete: 484/127.1s (3.8/s) Incomplete: 16/160.2s264057320238
semi-canonicalTotal: 500/276.7s (1.8/s) Complete: 484/116.6s (4.2/s) Incomplete: 16/160.2s264057264057
canonicalTotal: 500/298.6s (1.7/s) Complete: 483/128.4s (3.8/s) Incomplete: 17/170.2s186682186682
k=10 nearest neigbhors# unique SMARTS# SS tests
no cacheTotal: 500/520.5s (1.0/s) Complete: 471/230.0s (2.0/s) Incomplete: 29/290.5s4476483525563
semi-canonicalTotal: 500/490.1s (1.0/s) Complete: 472/209.5s (2.3/s) Incomplete: 28/280.6s4476482872040
canonicalTotal: 500/509.3s (1.0/s) Complete: 471/218.7s (2.2/s) Incomplete: 29/290.6s3300102109332
k=100 nearest neigbhors# unique SMARTS# SS tests
no cacheTotal: 500/414.4s (1.2/s) Complete: 486/271.4s (1.8/s) Incomplete: 14/143.0s1287819932263
semi-canonicalTotal: 500/363.5s (1.4/s) Complete: 487/230.8s (2.1/s) Incomplete: 13/132.7s1287817456877
canonicalTotal: 500/361.8s (1.4/s) Complete: 488/239.5s (2.0/s) Incomplete: 12/122.4s1076486196764
k<=100 at or above Tanimoto threshold of 0.95# unique SMARTS# SS tests
no cacheTotal: 500/642.1s (0.8/s) Complete: 458/220.3s (2.1/s) Incomplete: 42/421.9s4686614388074
semi-canonicalTotal: 500/624.5s (0.8/s) Complete: 461/232.8s (2.0/s) Incomplete: 39/391.7s4686613828813
canonicalTotal: 500/640.9s (0.8/s) Complete: 460/239.2s (1.9/s) Incomplete: 40/401.8s3648023222366

Whew! That's a lot of data to throw at you. Please believe me when I say that it took two days to generate correctly. BTW, "complete" means that the MCS search algorithm went to completion, the "incomplete" means that it timed out - here after 10 seconds - and gave a partial solution.

What does this tell me? Obviously, caching is good. As expected, the "semi-canonical" solution is always better than the "no cache" case, and the canonicalization always reduces the number of substructures and substructure tests. This means the canonicalization code is doing something right.

Unfortunately, it seems like canonicalization has a big performance impact. In the pairwise tests I do 922,000 canonicalizations in Python to save 212,000 substructure tests, and I lose 21 seconds in doing so. For the k=100 nearest neighbor benchmark, which is the only one where the canonicalization code saves time, I do 129,000 canonicalizations to save 1,260,000 comparisons, and gain about 2 seconds. This suggests that each canonicalization takes as long as 10 substructure matches, and that RDKit can do 60,000 SMARTS matches per second. A quick checks using one SMARTS gives 100,000 SMARTS matches per second, which helps validate this estimate.

This means I should take out the canonicalization until it can be implemented in C++. Granted, there may be a bug in the code, like there was with an earlier hueristic. But there would have to be an order of magnitude performance increase for this to be effective, and I don't think that's likely.

Unit tests aren't enough to drive development

Which leads me back to my thesis. It would have been possible to develop a few more unit tests for the canonicalization code. There would have been some extra scaffolding in order to do that, but that's a minor cost. I suspect that many of the tests would be very implementation-dependent, and tied to specific internal function calls. I don't like this because it means that a re-implementation of this internal component in C++, which submerges many of the internal helper functions and places them out of reach of Python-based unit tests, would cause the unit tests to be un-runnable. And I am certain that those unit tests would not be rigorous enough to be confident that the code was working correctly.

Think about the question "should I write this code in Python or C++?". It's a development-time question. Test Driven Development (TDD) is a development methodology which uses unit tests to help make development-time decisions. I think TDD is most helpful when used to establish the minimum requirements for a project.

I don't see how TDD is helpful here. I can't think of any unit tests which would guide this development decision. Sometimes you can write a "spike solution" (also called a "prototype"):

A spike solution is a very simple program to explore potential solutions. Build the spike to only addresses the problem under examination and ignore all other concerns.
This sounds good, but what development style should you use to develop the spike solution? Spikes are supposed to be throw-away code, but I can't figure out how something other than a complete, tested implementation of the canonical algorithm or the remaining growth heuristics would have been useful. After all, a two line change in the latter went from "working but 1% faster" to "working and 50% faster", and I didn't even know if there was going to give a speedup in the first place.

If not TDD, what methodology do I use?

Read Peter Seibel's Unit testing in Coders at Work. The author interviewed various famous programmers and learned more about how they tested their software. You may also want to read the related comments on Hacker News.

Seibel recounts that Joshua Bloch "wrote a monstrous 'basher'" which found "that occasionally, just occasionally, the basher would fail its consistency check." Bloch then wrote tests to pin down the failure, eventually finding the fault in a system mutex. Note that under normal unit test development you assume that your underlying components are correct, so this isn't something you would normally code for.

Seibel also reports that others assemble a large amount of test cases and uses that to check their code. This is of course the technique I used for the MCS problem.

It feels as trite as saying that the secret to losing weight is diet and exercise, but my method for programming is to understand the problem, implement it, and test it until you are confident that it's good enough. That knowledge of how to do that comes from experience, practice, reflection, and discussion.

For those who scoff and call this "big design up front," I merely point you to earlier parts of this essay. When I started this code I had a rough idea that it would work. I had previously implemented substructure enumeration, and the Weininger et al. canonicalization algorithm and SMARTS generation, so I knew that the components were possible, but I didn't know how they went together. Instead, I let the code development itself guide me. I thought some, I implemented code, I "ran the code in my head", I reflected on what the tricky parts were, I thought about how to make them more clear, and I tried various was to improve the code.

That was a lot to keep in my head, I wasn't sure if the result would be fast enough, and I had no way good way to test its usefulness until most of the code was in place. No wonder it was mentally taxing!

I believe tests - including unit tests implemented during development - are important. I don't believe that unit tests are good enough to guide complex algorithm development. However, most programming (over 95%?) is not complex algorithm development; complicated? yes, but not complex.

I once jokingly said that TDD is not useful if there's two or more embedded for loops. That's a bit of a simplification, but not far from my feelings. Some classes of problems, like complex algorithm development and security analysis, require a different attitude towards programming, emphasizing "what can go wrong?" and "how can I improve my confidence that the code is working correctly?" It's my view that this doubt-based philosophical attitude is missing from most discussions of software development practices.

And perhaps as fundamental, when I have to be that critical of my own work and probe it for failures and be open to the possibility of nasty gotchas lurking in the deep dark corners, then some of that doubt passes over into my personal life. I empathize with the idea of not living in that "strange place" all the time, and I can see why this isn't a common attitude.

Comments

This essay is meant to be a thoughtful reflection on the difficulties of programming, using the MCS problem to provide specific structure. If you want to leave comments about the MCS portion then use the MCS comment site. Otherwise, leave a comment on testing hard algorithms.

This is a fairly technical post talking about the structural changes I would like to see in CubicWeb's near future. Let's call that CubicWeb 4.0! It also drafts ideas on how to go from here to there. Draft, really. But that will eventually turn into a nice roadmap hopefully.

The great simplification

Some parts of cubicweb are sometimes too hairy for different reasons (some good, most bad). This participates in the difficulty to get started quickly. The goal of CubicWeb 4.0 should be to make things simpler :

  • Fix some bad old design.
  • Stop reinventing the wheel and use widely used libraries in the Python Web World. This extends to benefitting from state of the art libraries to build nice and flexible UI such as Bootstrap, on top of the JQuery foundations (which could become as prominent as the Python standard library in CubicWeb, the development team should get ready for it).
  • If there is a best way to do something, just do it and refrain from providing configurability and options.

On the road to Bootstrap

First, a few simple things could be done to simplify the UI code:

  • drop xhtml support: always return text/html content type, stop bothering with this stillborn stuff and use html5
  • move away everything that should not be in the framework: calendar?, embedding, igeocodable, isioc, massmailing, owl?, rdf?, timeline, timetable?, treeview?, vcard, wdoc?, xbel, xmlrss?

Then we should probably move the default UI into some cubes (i.e. the content of cw.web.views and cw.web.data). Besides making the move to Bootstrap easier, this should also have the benefit of making clearer that this is the default way to build an (automatic) UI in CubicWeb, but one may use other, more usual, strategies (such as using a template language).

At a first glance, we should start with the following core cubes:

  • corelayout, the default interface layout and generic components. Modules to backport there: application (not an appobject yet), basetemplates, error, boxes, basecomponents, facets, ibreadcrumbs, navigation, undohistory.
  • coreviews, the default generic views and forms. Modules to backport there: actions, ajaxedit, baseviews, autoform, dotgraphview, editcontroller, editforms, editviews, forms, formrenderers, primary, json, pyviews, tableview, reledit, tabs.
  • corebackoffice, the concrete views for the default back-office that let you handle users, sources, debugging, etc. through the web. Modules to backport here: cwuser, debug, bookmark, cwproperties, cwsources, emailaddress, management, schema, startup, workflow.
  • coreservices, the various services, not directly related to display of something. Modules to backport here: ajaxcontroller, apacherewrite, authentication, basecontrollers, csvexport, idownloadable, magicsearch, sessions, sparql, sessions, staticcontrollers, urlpublishing, urlrewrite.

This is a first draft that will need some adjustements. Some of the listed modules should be split (e.g. actions, boxes,) and their content moved to different core cubes. Also some modules in cubicweb.web packages may be moved to the relevant cube.

Each cube should provide an interface so that one could replace it with another one. For instance, move from the default coreviews and corelayout cube to bootstrap based ones. This should allow a nice migration path from the current UI to a Bootstrap based UI. Bootstrap should probably be introduced bottom-up: start using it for tables, lists, etc. then go up until the layout defined in the main template. The Orbui experience should greatly help us by pointing at hot spots that will have to be tackled, as well as by providing a nice code base from which we should start.

Regarding current implementation, we should take care that Contextual components are a powerful way to build "pluggable" UI, but we should probably add an intermediate layer that would make more obvious / explicit:

  • what the available components are
  • what the available slots are
  • which component should go in which slot when possible

Also at some point, we should take care to separate view's logic from HTML generation: our experience with client works shows that a common need is to use the logic but produce a different HTML. Though we should wait for more use of Bootstrap and related HTML simplification to see if the CSS power doesn't somewhat fulfill that need.

On the road to proper tasks management

The current looping task / repo thread mecanism is used for various sort of things and has several problems:

  • tasks don't behave similarly in a multi-instances configuration (some should be executed in a single instance, some in a subset); the tasks system has been originally written in a single instance context; as of today this is (sometimes) handled using configuration options (that will have to be properly set in each instance configuration file);
  • tasks is a repository only api but we also need web-side tasks;
  • there is probably some abuse of the system that may lead to unnecessary resources usage.

Analyzing a sample http://www.logilab.org/ instance, below are the running looping task by categories. Tasks that have to run on each web instance:

  • clean_sessions, automatically closes unused repository sessions. Notice cw.etwist.server also records a twisted task to clean web sessions. Some changes are imminent on this, they will be addressed in the upcoming refactoring session (that will become more and more necessary to move on several points listed here).
  • regular_preview_dir_cleanup (preview cube), cleanup files in the preview filesystem directory. Could be executed by a (some of the) web instance(s) provided that the preview directory is shared.

Tasks that should run on a single instance:

  • update_feeds, update copy based sources (e.g. datafeed, ldapfeed). Controlled by 'synchronize' source configuration (persistent source attribute that may be overridden by instance using CWSourceHostConfig entities)
  • expire_dataimports, delete CWDataImport entities older than an amount of time specified in the 'logs-lifetime' configuration option. Not controlled yet.
  • cleanup_auth_cookies (rememberme cube), delete CWAuthCookie entities whose life-time is exhausted. Not controlled yet.
  • cleaning_revocation_key (forgotpwd cube), delete Fpasswd entities with past revocation_date. Not controlled yet.
  • cleanup_plans (narval cube), delete Plan entities instance older than an amount of time specified in the configuration. If 'plan-cleanup-delay' is set to an empty value, the task isn't started.
  • refresh_local_repo_caches (vcsfile cube), pull or clone vcs repositories cache if the Repository entity ask to import_revision_content (hence web instance should have up to date cache to display files content) or if 'repository-import' configuration option is set to 'yes'; import vcs repository content as entities if 'repository-import' configuration option and it is coming from the system source.

Some deeper thinking is needed here so we can improve things. That includes thinking about:

  • the inter-instances messages bus based on zmq and introduced in 3.15,
  • the Celery project (http://celeryproject.org/), an asynchronous task queue, widely used and written in Python,

Remember the more cw independent the tasks are, the better it is. Though we still want an 'all-integrated' approach, e.g. not relying on external configuration of Unix specific tools such as CRON. Also we should see if a hard-dependency on Celery or a similar tool could be avoided, and if not if it should be considered as a problem (for devops).

On the road to an easier configuration

First, we should drop the different behaviour according to presence of a '.hg' in cubicweb's directory. It currently changes the location where cubicweb external resources (js, css, images, gettext catalogs) are searched for. Speaking of implementation:

  • shared_dir returns the cubicweb.web package path instead of the path to the shared cube,
  • i18n_lib_dir returns the cubicweb/i18n directory path instead of the path to the shared/i18n cube,
  • migration_scripts_dir returns the cubicweb/misc/migration directory path instead of share/cubicweb/migration.

Moving web related objects as proposed in the Bootstrap section would resolve the problem for the content web/data and most of i18n (though some messages will remain and additional efforts will be needed here). By going further this way, we may also clean up some schema code by moving cubicweb/schemas and cubicweb/misc/migration to a cube (though only a small benefit is to be expected here).

We should also have fewer environment variables... Let's see what we have today:

  • CW_INSTANCES_DIR, where to look for instances configuration
  • CW_INSTANCES_DATA_DIR, where to look for instances persistent data files
  • CW_RUNTIME_DIR, where to look for instances run-time data files
  • CW_MODE, set to 'system' or 'user' will predefine above environment variables differently
  • CW_CUBES_PATH, additional directories where to look for cubes
  • CW_CUBES_DIR, location of the system 'cubes' directory
  • CW_INSTALL_PREFIX, installation prefix, from which we can compute path to 'etc', 'var', 'share', etc.

I would propose the following changes:

  • CW_INSTANCES_DIR is turned into CW_INSTANCES_PATH, and defaults to ~/etc/cubicweb.d if it exists and /etc/cubicweb.d (on Unix platforms) otherwise;
  • CW_INSTANCES_DATA_DIR and CW_RUNTIME_DIR are replaced by configuration file options, with smart values generated at instance creation time;
  • the above change should make CW_MODE useless;
  • CW_CUBES_DIR is to be dropped, CW_CUBES_PATH should be enough;
  • regarding CW_INSTALL_PREFIX, I'm lacking experience with non-hg-or-debian installations and don't know if this can be avoided or not.

Last but not least (for the moment), the 'web' / 'repo' / 'all-in-one' configurations, and the fact that the associated configuration file changes stinks. Ideas to stop doing this:

  • one configuration file per instance, with all options provided by installed parts of the framework used by the application.
  • activate 'services' (or not): web server, repository, zmq server, pyro server. Default services to be started are stored in the configuration file.

There is probably more that can be done here (less configuration options?), but that would already be a great step forward.

On the road to...

The following projects should be investigated to see if we could benefit from them:

Discussion

Remember the following goals: migration of legacy code should go smoothly. In a perfect world every application should be able to run with CubicWeb 4.0 until the backwards compatibility code is removed (and CubicWeb 4.0 will probably be released as 4.0 at that time).

Please provide feedbacks:

  • do you think choices proposed above are good/bad choices? Why?
  • do you know some additional libraries that should be investigated?
  • do you have other changes in mind that could/should be done in cw 4.0?

John MacCuish, from Mesa Analytics, pointed out that the MCS problem takes polynomial time if the graphs are planar. He writes: "Are there graphs in your likely sets, that are not planar? I have never seen a non-planar small molecule drug for example, but they may be out there. Tests for planarity are also in P. Of course it doesn't mean that solutions in P will be faster than the usual methods since N is small an the overhead for planar MCS may be large, such that N may need to be large for the planar method to beat the non-planar heuristics."

This leads to a couple of questions. One is, what is the shape of the run-time of my MCS algorithm? I don't actually know. Some tests take a very long time, but that's not enough to establish that it's polynomial for real-world compounds or exponential. I'm not going to research this now.

Another is, are there real-world small molecules which are non-planar, in the topological sense, and not the chemistry sense where all of the non-hydrogen atoms line on or near a plane?

Previous work in topologically non-planar compounds

A quick literature seach finds Synthesis of the first topologically non-planar molecule, which says:

We report here the synthesis and characterization of the tris-ether 2,5,14-trioxahexacyclo-[5.5.2.1.24,10.O4,17.O10,17]-heptadecane 3; this topologically unique (graph theory) molecule is prepared via a novel intramolecular rearrangement of either of two isomeric propellane spiro-epoxides, 1 and 2.

There's also Topological stereochemistry. 9. Synthesis and cutting "in-half" of a molecular Möbius strip by Walba, Homan, Richards, and Haltiwanger in New. J. Chem., 1993, 17, 661-681.

Quoting from the above link to Modern Physical Organic Chemistry by Ansyln and Dougherty:

We mention briefly here another topological issue that has fascinated chemists. For the overwhelming majority of organic molecules, we can draw a two-dimensional representation with no bonds crossing each other. ... It may seem surprising, but most molecules have planar graphs.

Recent efforts have produced chemical structures that successfully realize many interesting and novel topologies. A landmark was certainly the synthesis of a trefoil knot using Sauvage's Cu+/phenanthroline templating strategy.... Vögtle and co-workers have described an "all organic" approach to amide-containing trefoil knots, and have been able to separate the two enantiomeric knots using chiral chromatography. Another seminal advance in the field was the synthesis and characterization of a 'Möbius strip' molecule..."

So it's well-established that there are topologically non-planar structures. But are they in compound databases which I can access?

Searching PubChem for topologially non-planar compounds

Take a look at Scaffold Topologies II: Analysis of Chemical Databases by Wester, Pollock, Coutsias, Allu, Muresan and Oprea in PMC 2010 January 15., Published in final edited form as: J Chem Inf Model 2008 July; 48(7): 1311-1324.

Only 12 nonplanar and 2,099 spiro node topologies (all of which are planar) are present in the merged database. 9 of the nonplanar topologies are found only in PubChem and the total number of molecules represented by such topologies in the merged database is a mere 44, agreeing with Walba's assessment concerning the rarity of chemicals with nonplanar graphs.

This establishes that in 2008 there were no more than 44 topologically non-planar structures in PubChem. Can I find them?

I don't think any of the database search engines support this capability, so I need to write a program.

For that I need a method for planarity testing. Various sources say that the linear time algorithm is "widely regarded as being quite complex", but that one of the linear time planarity algorithms is part of Sage

Sage is a comprehensive mathematical software system, which uses Python. There's a pre-built binary distribution for my OS, which I downloaded. It includes Python, the IPython shell, and everything else it needs, so it really is a system and not a set of Python modules.

With Sage it's easy to make a graph and test for planarity.

% sage
----------------------------------------------------------------------
| Sage Version 5.0, Release Date: 2012-05-14                         |
| Type notebook() for the GUI, and license() for information.        |
----------------------------------------------------------------------
sage: d = {1: [2,3], 2: [3], 3: [4, 6], 4: [5]}
sage: g = Graph(d)
sage: g.is_planar()
True
sage: 
sage: k3_3 = {1: [2,3,4,5,6], 2: [3,4,5,6], 3: [4, 5, 6]}
sage: Graph(k3_3).is_planar()
False
sage: 

The dictionary I use as input to "Graph" contains an upper-triangle connection matrix. So to test if a molecule is topologically planar, I just need to convert its connectivity information into a dictionary of the right form, turn the dictionary into a Graph, and test the graph for is_planar().

Roadblock! My Python 2.6 modules and Sage's 2.7 Python don't mix

The prebuilt binaries include its own Python distribution, and when you use "sage", it replaces the PYTHONPATH with its own settings. This means when I run sage I don't have access to the cheminformatics tools I've already set up for my system:

% sage
----------------------------------------------------------------------
| Sage Version 5.0, Release Date: 2012-05-14                         |
| Type notebook() for the GUI, and license() for information.        |
----------------------------------------------------------------------
sage: import rdkit
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)

/Users/dalke/<ipython console> in <module>()

ImportError: No module named rdkit
sage: from openeye.oechem import *
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)

/Users/dalke/<ipython console> in <module>()

ImportError: No module named openeye.oechem
sage: os.environ["PYTHONPATH"]
'/Users/dalke/ftps/sage/local/lib/python'
I tried to force it to include the right path, and that failed.
% printenv PYTHONPATH
/Users/dalke/ftps/openeye//wrappers/v2011.Oct.1/python/:/Users/dalke/envs/RDKit_2011_12-svn
% sage
   ...
sage: import sys
sage: sys.path.insert(0, "/Users/dalke/envs/RDKit_2011_12-svn")
sage: import os
sage: os.environ["DYLD_LIBRARY_PATH"]
'/Users/dalke/ftps/sage/local/lib:/Users/dalke/ftps/sage/local/lib/R/lib::/Users/dalke/ftps/openeye//wrappers/libs:/Users/dalke/envs/RDKit_2011_12-svn/lib:/Users/dalke/ftps/sage/local/lib/R/lib'
sage: from rdkit import Chem
Fatal Python error: Interpreter not initialized (version mismatch?)

------------------------------------------------------------------------
Unhandled SIGABRT: An abort() occurred in Sage.
This probably occurred because a *compiled* component of Sage has a bug
in it and is not properly wrapped with sig_on(), sig_off(). You might
want to run Sage under gdb with 'sage -gdb' to debug this.
Sage will now terminate.
------------------------------------------------------------------------
/Users/dalke/ftps/sage/spkg/bin/sage: line 312: 56741 Abort trap              sage-ipython "$@" -i
What happened here is that Sage ships with Python 2.7, while the locally installed cheminformatics toolkits use Python 2.6.

I decided to use another technique. I would have sage call out to another program to handle the cheminformatics. For each structure it will output a line containing the identifier, the SMILES, and the needed upper-triangle dictionary data structure. One line of output will look like:

('15', 'OC1C2(C(C3C(C4(C(CC3)CC(=O)CC4)C)CC2)CC1)C', {0: [1], 1: [2, 19],
 2: [3, 20, 17], 3: [4, 18], 4: [5, 9], 5: [6, 16], 6: [7, 15, 14],
 7: [8, 10], 8: [9], 10: [11], 11: [12, 13], 13: [14], 16: [17], 18: [19]})
All the sage code needs to do is read those lines, extract the data, pass the graph into Sage for analysis, and print out those which are non-planar.

Even this wasn't as easy as I thought. The PYTHONPATH environment variable persists in spawned processes, which the python subprocess I started uses the wrong PYTHONPATH. I ended up setting the environment variables myself - including PYTHONHOME - before it would work correctly:

import sys
import os
env = os.environ.copy()
env["DYLD_LIBRARY_PATH"] = "/Users/dalke/ftps/openeye//wrappers/libs:/Users/dalke/envs/RDKit_2011_12-svn/lib"
env["PYTHONPATH"] = "/Users/dalke/envs/RDKit_2011_12-svn"
env["RDBASE"] = "/Users/dalke/envs/RDKit_2011_12-svn"
env["PYTHONHOME"] = "/System/Library/Frameworks/Python.framework/Versions/2.6"

import subprocess

# Import the "Graph" constructor
from sage.all import *

# Use the version of Python for which RDKit was built to use
p = subprocess.Popen(["/usr/bin/python2.6", "pubchem_connectivity.py"],
                     env = env,
                     stdout = subprocess.PIPE)
                 
for i, line in enumerate(p.stdout):
    id, smiles, d = eval(line)
    G = Graph(d)
    if not G.is_planar():
        print "======= Found one!!!"
        print id, smiles
    if i % 100 == 0:
        sys.stderr.write("%d ...\r" % (i,))
        sys.stderr.flush()

Now I just need the "pubchem_connectivity.py" program to generate the connectivity information.

Call another program to generate the connectivity information

I've been using RDKit these days, because the funding I got for the MCS project said I needed to use RDKit. I usually use OEChem, which (among other things) has much faster structure parsers than RDKit. Since I'm going to read tens of millions of structures, I wanted a way to reduce the number of structures to process.

Now, if a graph has no cycles then it's always planar. Even if it has one, two, or three cycles, it's still impossible for it to be non-planar. The number of cycles in a single component graph is E-V+1, which is the number of edges (bonds), minus the number of vertices (atoms), plus 1. So a simple test is to exclude molecules where the number of bonds is less than three greater than the number of atoms.

Mind you, this is an exclusion test which rejects graphs that cannot be planar. There are plenty of molecules with dozens or rings which are topologically planar. Also, PubChem has plenty of records with multiple components, which means I may miss a few cases. But mind you, my goal is to find some non-planar structures in PubChem, not find all of them.

I had previously converted my local copy of PubChem to a set of compressed SMILES files. A few months ago I wrote a set of Ragel definitions for SMILES, which includes a demonstration of how to use count the number of atoms and bonds in a SMILES file. I can use that unmodified as a co-process to do high-speed counting for me.

import glob
import subprocess
import gzip
import sys
from collections import defaultdict
from rdkit import Chem

# Start a co-process to count the number of atoms and bonds in a SMILES string
p = subprocess.Popen(["/Users/dalke/opensmiles-ragel/smiles_counts"],
                     stdin = subprocess.PIPE,
                     stdout = subprocess.PIPE)

filenames = glob.glob("/Users/dalke/databases/pubchem/*.smi.gz")
for i, filename in enumerate(filenames):
    msg = "Processing %d/%d\n" % (i+1, len(filenames))
    sys.stderr.write(msg)
    sys.stderr.flush()

    for line in gzip.open(filename):
        # Read a line from the data file
        smiles, id = line.split()

        # Send it to the co-process
        p.stdin.write(smiles + "\n")
        p.stdin.flush()

        # Get the counts (looks like "atoms: 34 bonds: 38")
        line = p.stdout.readline()
        _, atoms, _, bonds = line.split()
        num_atoms = int(atoms)
        num_bonds = int(bonds)

        # Skip records which cannot be non-planar
        # (Note: assumes single component structures!)
        if num_bonds < num_atoms + 3:
            continue

        # Extract the topology into upper-triangle dictionary form
        mol = Chem.MolFromSmiles(smiles)
        if mol is None:
            continue
        d = defaultdict(list)
        for bond in mol.GetBonds():
            b1 = bond.GetBeginAtomIdx()
            b2 = bond.GetEndAtomIdx()
            if b1 < b2:
                d[b1].append(b2)
            else:
                d[b2].append(b1)

        # print to stdout so the Sage program can read it
        data = (id, smiles, dict(d))
        print repr(data)
BTW, if you haven't been paying attention, I have one process which tests if a SMILES string has enough bonds in it, another to convert structure information into a simple topology, and a third program to do the graph planarity test. "Bailing-wire and chewing gum!" to repeat an old phrase.

The structures!

It took many hours for my computer to chug along (while I slept). I ended up with 224 of the non-planar SMILES in the subset of 28.5 million PubChem structures I have on my machine. (In other words, bear in mind that this is not a complete search!)

Here are a few structures to show you what they look like:

That first structure, silicon nitride, is a ceramic, and which cannot be expressed in SMILES. (That is "[C]" is an equally poor SMILES representation of graphite as it is for diamond.) The others look like progressively more realistic chemical representations.

The non-planar structures were a bit too verbose to show as a SMILES file. Instead, here's the full list of identifiers, which you can easily use to get the structures yourself (or follow the hyperlink to look at the non-planar depictions).

390566, 498002, 636755, 3084099, 4868274, 5104674, 6712449, 10019039, 10882690, 10895017, 10898366, 10994362, 11027681, 11126005, 11131973, 11187418, 11340053, 11350583, 11360285, 11384163, 11407796, 11672382, 14381365, 14381430, 14381432, 14381435, 16132679, 16132681, 16132995, 16132999, 16133150, 16133259, 16133262, 16133397, 16133412, 16133413, ee16133414, 16133878, 16145580, 16146229, 16146230, 16148442, 16148609, 16148632, 16148888, 16148900, 16149007, 16149114, 16149222, 16149238, 16149361, 16149362, 16149482, 16149579, 16149602, 16149827, 16149958, 16150054, 16150193, 16150399, 16150419, 16150654, 16150658, 16150833, 16150994, 16151169, 16151337, 16151360, 16151567, 16151729, 16151960, 16152362, 16152566, 16152608, 16152699, 16152729, 16153098, 16153207, 16153649, 16154275, 16154327, 16154453, 16154584, 16154971, 16155013, 16155014, 16155015, 16155069, 16155075, 16155076, 16155130, 16155140, 16155152, 16155193, 16155194, 16155197, 16155399, 16155442, 16155443, 16155456, 16155607, 16155626, 16155630, 16155631, 16155632, 16155633, 16155652, 16155784, 16155884, 16155916, 16155917, 16155920, 16155972, 16156042, 16156080, 16156082, 16156145, 16156198, 16156260, 16156261, 16156264, 16156265, 16156312, 16156411, 16156413, 16156417, 16156432, 16156495, 16156539, 16156544, 16156545, 16156609, 16156610, 16156613, 16157557, 16214951, 17749011, 21597602, 21597607, 21597610, 21597611, 21770498, 22294696, 22835058, 22835161, 22835262, 22835624, 22835636, 22835637, 23327291, 23584643, 23726086, 23727886, 23955822, 24764125, 24770227, 24770228, 24770229, 24770290, 24770291, 24871221, 24940071, 25200029, 44239114, 44303783, 44303799, 44303821, 44382489, 44397933, 44397934, 44566282, 44566284, 44575206, 44575207, 44575208, 44592641, 44592642, 44592645, 44592646, 44592647, 44592648, 44592945, 44593582, 44606373, 44606374, 46882313, 46882314, 46882315, 46882316, 46882317, 46882318, 46882319, 46882320, 46882321, 46882322, 46882323, 46882324, 46882325, 46882326, 46882327, 46882328, 46891923, 46895835, 49799159, 49799160, 49873810, 50897242, 50900298, 50900299, 50900300, 50919058, 51004304, 51026319, 52945815, 52952313, 52952314, 52952315, 52952316, 52952317, 53468167, 53468168, 53468169, 53468170, 53468171

Still, having some SMILES on-hand is nice so here are some of the shorter SMILES strings. At the least, you can use these as test-cases for an MCS search engine, and perhaps force a linear-time planar-graph-only method to fail :)

O1C23C4C5(C16OCC7C4(CC(C2(OC(=O)C3(CC5C8C6CC=C9C8(C(=O)C=CC9)C)O)C)OC7=O)C)O 390566
O=C1N2C34N5CCC(C3CCC4(C1)C=CC5)c6c2cccc6 498002
O=C1N2C34N5CCC(C3CCC4(C1)C=CC5)c6c2cc(cc6)OC 636755
[Si]123N4[Si]56N1[Si]4(N25)N36 3084099
C123C45C16C24C6C7C(C35)CC(C(C7)C)C 4868274
O1C23C4C5(C16OCC7C4(CC(C2(OC(=O)C3(CC5C8C6CC=C9C8(C(=O)C=CC9)C)O)C)OC7=O)C)O 5104674
O1C23C4C5(C16OCC7C4(CC(C2(OC(=O)C3(CC5C8C6CC=C9C8(C(=O)C=CC9)C)O)C)OC7=O)C)O 6712449
O=C1N2C34N5CCC(C3CCC4(C1)C=CC5)c6c2cc(cc6)OC 10019039
O=C1N2C34N5CCC(C3CCC4(C1)C=CC5)c6c2cc(c(c6)OC)OC 10882690
O1C2C34C5C(CN(C3C(C5OC)c6c(cc(c(c6)OC)O)C4C1)CC)(C(C2)O)CO 10895017
O1c2c3cccc2Cc4c5c(ccc4)Cc6c7c(ccc6)Cc8c(c(ccc8)C3)OCCOCCN(CCOCCO5)CCOCCOc9c(cccc9)OCCOCCN(CCOCC1)CCOCCO7 10898366
O1C2C34C5C(C(C2)OC(=O)C)(CN(C3C(C5OC)c6c(cc(c(c6)OC)O)C4C1)CC)COC 10994362
C123C45c6c(cccc6)C17c8c(cccc8)C2(c9c(cccc9)C3(c1c4cccc1)c1c7cccc1)c1c5cccc1 11027681
OC1N2C34N5CCC(C3CCC4(C1)C=CC5)c6c2cc(c(c6)OC)OC 11131973
O1c2c3c(ccc2)OB4Oc5c6c(ccc5)OB1Oc7c(c(ccc7)O4)N63 11187418
C123C45C6C7C8C19C1C(C4C4C%10C22C%11C(C5C5C%12C33C(C6C5)C8CC5C9C(C2C(C35)CC%12%11)CC1%10)C4)C7 11340053
O1C2C34C5C(CN(C3C(C5OC)c6c(cc(c(c6)O)O)C4C1)CC)(C(C2)O)COC 11350583
C123C4C5C1C6C5(C4C26)C=CC#CC=CC78C9C1C7C2C1(C9C82)C=CC#CC=C3 11360285
C123C4C5C1C6C5(C4C26)C=CC#CC#CC=CC78C9C1C7C2C1(C9C82)C=CC#CC#CC=C3 11384163
O1C2C34C5C(CN(C3C(C5OC)c6c(cc(c(c6)OC)O)C4C1)CC)(C(C2)O)COC 11407796
O(c1cc2c(cc1OC)C34C56C27c8c(cc(c(c8)OC)OC)C5(c9c3cc(c(c9)OC)OC)c1c(cc(c(c1)OC)OC)C6(c1c7cc(c(c1)OC)OC)c1c4cc(c(c1)OC)OC)C 11672382
O(C12NC(=O)C3C4C56C1C7C8C59C15C6(C3C3C1C(C9C43)C#N)C2C7C5C8=O)C(=O)C 14381365
O1C2C3C45C67C2C8C3C9C42C6(C8C9=O)C3C4C7C(C5C4C2C3C#N)C1=O 14381430
O1C2C3C45C67C2C8C3C9C42C63C8C9OC(=O)C4C2C2C5C(C7C2C34)C1=O 14381432
BrC12C34C56C7(C8C9C5C5C3C9C1C8C(=O)OC1C2C2C4C(C6C2C71)OC5=O)Br 14381435
O1c2c3c4c5c6c7c3c(cc8c7c(cc6Oc9cc(ccc9)OCCOCCOc3cc1ccc3)C(=O)N(C8=O)C(CCCCCC)C)Oc1cc(ccc1)OCCOCCOc1cc(ccc1)Oc5cc1c4c(c2)C(=O)N(C1=O)C(CCCCCC)C 16214951
BrC12C34C56C7(C8C9C5C5C3C9C1C8C(=O)OC1(C2C2C4C(C6C2C71)OC5=O)Br)Br 21597602
BrC12C34C56C7C8C9C3C7C(=O)OC3C4C4C1C(=O)C(C5(C8C(C29)C(=O)OCC)Br)C4C63 21597607
BrC12C34C56C7C8C9C5C5C3C9C1C8C(=O)OC1C2C2C4C(C6C2C71)OC5=O 21597610
O1C2C3C45C67C2C8C3C9C42C6(C8C9O)C3C4C7C(C5C4C2C3C(=O)O)C1 21597611
O1C23OC(OC4C25C6C7(C1=O)C(C3OC(OC5C(=C)C4CC6)(C)C)C(CCC7)(C)C)(C)C 21770498
ClC1=CC2OC3C1C4OC5C2C4C3C5 22294696
O=C1N2C34N5CCC(C3CCC4(C1)C=CC5)c6c2cc(cc6)OC 23327291
O1c2cc3c4cc2OCCOCCOCCOc5c(cc6c(c5)c(c7c(c6C)cc8c(c7)OCCOCCOCCOc9cc2c(cc9OCCOCCOCCO8)C4(c4c(cccc4)C32C)C)C)OCCOCCOCC1 23584643
O1C2(OCC34C5C2(CCC3OC(=O)C67C4C(OC51)CC(C6)C(=C)C7=O)C)C 23727886
O=C1N2C34N5CCC(C3CCC4(C1)C=CC5)c6c2cc(c(c6)OC)OC 23955822
O1C2C3C45C6C1OC(C6(CCC4OC(=O)C37CC(C2)C(=C)C7=O)C)OC5 24764125
O=C1N2C34N5CCC(C3CCC4(C1)C=CC5)C6=C2CC(=C(C6)OC)OC 24871221
O1C2C34C56C(C(C2(C=C5)OC)C(O)(CCCC)C)CCN(C6Cc7c3c1c(cc7)O)CC4 44303783
O1C2C34C56C(C(C2(C=C5)OC)C(O)(CCCC)C)CN(C6Cc7c3c1c(cc7)O)CC4 44303799
O1C2C34C56C(C(C2(C=C5)OC)C(O)(CCCC)C)CCN(C6Cc7c3c1c(cc7)O)CC4 44303821
S(=O)(=O)(OCC12C3C4C(C1)CC(C3)CC4C2)N 44382489
O=C1N2C34N5CCC(C3CCC4(C1)C=CC5)c6c2cc(c(c6)OC)OC 44592945
O1C23C4C5(C16OCC7C4(CC(C2(OC(=O)C3(CC5C8C6CC=C9C8(C(=O)C=CC9)C)O)C)OC7=O)C)O 49873810
O=C1N2C34N5CCC(C3CCC4(C1)C=CC5)c6c2cccc6 50897242
O=C1N2C34N5CCC(C3CCC4(C1)C=CC5)c6c2cc(cc6)OC 50919058
O1C23C4C5(C16OCC7C4(CC(C2(OC(=O)C3(CC5C8C6CC=C9C8(C(=O)C=CC9)C)O)C)OC7=O)C)O 51004304
O=C1N2C34N5CCC(C3CCC4(C1)C=CC5)c6c2cc(cc6)OC 51026319

Feel free to leave a comment.

09:27

We have a report from our reader Tuukka, who observed a flood of DNS ANY requests from likely spoofe ...(more)...

05:15

Hello all,

I’m very pleased and excited to announce that the Open Bioinformatics Foundation has selected 5 very capable students to work on OBF projects this summer as part of the Google Summer of Code (GSoC) program.

The accepted students, their projects, and their mentors (in alphabetical order):

  • Wibowo Arindrarto:
    SearchIO Implementation in Biopython
    mentored by Peter Cock
  • Lenna Peterson:
    Diff My DNA: Development of a Genomic Variant Toolkit for Biopython
    mentored by Brad Chapman, Reece Hart, James Casbon
  • Marjan Povolni:
    The worlds fastest parallelized GFF3/GTF parser in D, and an interfacing biogem plugin for Ruby
    mentored by Pjotr Prins, Francesco Strozzi, Raoul Bonnal
  • Artem Tarasov:
    Fast parallelized GFF3/GTF parser in C++, with Ruby FFI bindings
    mentored by Pjotr Prins, Francesco Strozzi, Raoul Bonnal
  • Clayton Wheeler:
    Multiple Alignment Format parser for BioRuby
    mentored by Francesco Strozzi and Raoul Bonnal

As in every year, we received many great applications and ideas. However, funding and mentor resources are limited, and we were not able to accept as many as we would have liked. Our deepest thanks to all the students who applied: we sincerely appreciate the time and effort you put into your applications, and hope you will still consider being a part of the OBF’s open source projects, even without Google funding. I speak for myself and all of the mentors who read and scored applications when I say that we were truly honored by the number and quality of the applications we received.

For the accepted students: congratulations! You have risen to the top of a very competitive application process. Now it’s time to “put your money where your mouth is”, as the saying goes. Let’s get out there and write some great code this summer!

Best regards,

Robert Buels
OBF GSoC 2012 Organization Administrator

02:15

SF PyLadies First Workshop: Build your own Blog

Zaki Akhmad: Python Indonesia Meetup #4Planet Python

Yes, we’re back with meetup. This is the fourth meetup. Well, actually it’s my first Python Indonesia meetup. The meetup was held at detik.com office. The official meetup announcement was published by Fanani here. I came a little bit late … Lanjut membaca

Sometimes when you are waiting for something, time goes by very slowly. But because you are so focused on that one thing, everything else in life moves really fast. I've been unemployed since the end of January. During that time I made ZenIRCBot significantly better, I wrote a simple site for tracking your workout stats, I attended two conferences, PyCon and Barcamp Portland. I visited two states that I'd never been to. Flew for the first time and took my longest train ride.

Basically I've done a ton in this time. But it feels like it has been a really long time because I've been so focused on getting my resume put together with Mozilla in mind. Then once I finally had that to a point that I was happy, I started showing it to friends for them to review and that took forever. Then I handed it off to my friend Jason to apply and write a letter of recommendation for me.

Then I waited another eternity (it felt like at least) to hear back, be flown down and get an offer from them. I was so focused on that, that everything else flew by me and I may not have gotten the most out of things. Which is fine because I'll be starting at Mozilla on the 29th of May. Working on http://addons.mozilla.org and related sites with the WebDev team.

What that should really read as, is that I am incredibly lucky to be getting the chance to work at a company that I'd only really dreamt of working at before. I'm going to be working with some awesomely brilliant people, for a company who's mission is to make the web a better place, while working on some really interesting and difficult engineering, doing it in a language I love (Python) with a framework I love (Django). If you know me, then you know how much I love working on interesting hard problems.

I'm writing this mostly as a stream of consciousness because I don't have a better way to put this stuff together. In the future I'm hoping to take some time, get some peer review for my posts before I put them up and talk about the awesome things I'm doing at Mozilla and in my free time. If you want to be someone to helps me with my writing, let me know, I could use all the help I can get.

-Wraithan

Sunday, 20 May

22:54

Good old log files are still the most reliable, versatile, and useful sources of information.
When they are also human-readable and easy to use with standard Unix tools like tail and grep they are even more useful.

We all know that especially in time of stress when things with a system are going south having a quick and easy way to grep for useful information is critical. The purpose of a small Python utility Logsna is to provide a sane log output format that makes some grepping a bit easier.

Logsna offers a custom formatter class logsna.Formatter that can be used in a logging config file, for example:

# sanefmt.py
import logging
import logging.config
from StringIO import StringIO

CONFIG = """\
[loggers]
keys=root

[handlers]
keys=console

[handler_console]
class=logging.StreamHandler
args=(sys.stderr,)
formatter=sane

[formatters]
keys=sane

[logger_root]
level=DEBUG
handlers=console

# Our custom formatter class
[formatter_sane]
class=logsna.Formatter
"""

config = StringIO(CONFIG)
logging.config.fileConfig(config)

log = logging.getLogger('mylogger.component1')

log.debug('debug message')
log.info('info message')
log.warning('warning message')
log.critical('critical message')
try:
    1 / 0
except:
    log.exception('Houston we have a problem')

The Log Format

Here is an output from the above program:

DEBUG    [2012-05-21 01:59:23,686] mylogger.component1: debug message
INFO     [2012-05-21 01:59:23,686] mylogger.component1: info message
WARNING  [2012-05-21 01:59:23,686] mylogger.component1: warning message
CRITICAL [2012-05-21 01:59:23,686] mylogger.component1: critical message
ERROR    [2012-05-21 01:59:23,686] mylogger.component1: Houston we have a problem
! Traceback (most recent call last):
!   File "/home/alienoid/python/sanefmt.py", line 67, in
!     1 / 0
! ZeroDivisionError: integer division or modulo by zero

The Log Format Notes

- All timestamps are in ISO8601 and UTC format

- To grep for messages of a specific level

$ tail -f sanefmt.log | grep '^INFO'

- To grep for messages from a particular logger

$ tail -f sanefmt.log | grep 'component1:'

- To pull out full exception tracebacks with a corresponding log message

$ tail -f sanefmt.log | grep -B 1 '^\!'

Installation

$ [sudo] pip install logsna