For a while now, I have been waxing lyrical (to those who will listen) about the variety of new tools and analyses available to people who want to prognosticate. If nothing else, the current craze for data within most businesses has resulted in people almost literally swimming around in the stuff without an awful lot of an idea about what to do with it, and while this has lead to some unspeakably shambolic practices (those who know me will likely have heard me on my hobby horse about proving models with actual experimentation) it has also opened up new horizons for people like me.

So, I’m delighted to have been invited to give a talk I submitted to the Bartech Tech2020 conference, this coming week in London – the first meeting of this particular group, there is a great line-up of speakers, all of whom are going to be reading the runes and describing their vision of the year 2020. Wonderfully, the various talks will be recorded and available, so there will be significant opportunity come the year 2020 to look back and groan loudly at the errors and omissions that will have piled up nicely by that point.

There are some brilliant speakers lined up, and I have to confess to being eager to particularly hear from this lot (in no particular order):

  • Zoe Cunningham – the old refrain, “culture eats strategy for breakfast”, has never been more true than now. It’s also one of the most difficult things to set right and predict;
  • David Wood – working in healthcare, I’m incredibly interested in David’s talk, and am certain that what we will call healthcare in another ten year’s time will in many ways bear little resemblance to what is practised now;
  • Simon Riggs – in all honesty, I’m hoping he’s going to be talking at least in part about homomorphic encryption because I just read the Gentry paper recently and it’s fascinating, but there is so much to come in this space – particularly now that data is so large and non-local that all sorts of new strategies are needed.

I’m going to attempt to tweet through most of the conference in low volume, probably on #tech2020, and look forward to putting a few more faces to names from Bartech.

What I realised I’m missing from Gnome

Not that long ago, I did a switch on my Android phone: against all the promises I made to myself beforehand, I switched on the Google account and allowed it to sync up to GCHQ/NSA the cloud. I did this for one main reason: I had just got an Android tablet, and I despised having to do the same stuff on each device, particularly since they weren’t running the same versions of Android, and one was a Nexus – so not all the UI was the same. The benefits, I have to say, were pretty much worth it: I don’t have too much sensitive data on there, but the ease of use is incredible. What was particularly good was that when I broke my phone, and had to have a new one, once the new one was linked up everything was basically back how it was. That’s tremendously powerful.

Now, I recently acquired a bit of Apple equipment and of course installed Fedora 19 on it. Just to digress briefly: installing Fedora 19 on any new Mac hardware, particularly if you want to keep Mac OS X around (I don’t much care for OS X, but keeping it for now seems handy), is tremendously difficult. I had wired ethernet (brilliant, because I was using the netinstall – which, I should note, is a truly wonderful experience in the new Anaconda) which was lucky, since the wifi doesn’t work by default. The disk partitioning is incredibly complex, and the installation documentation is not particularly good. At some point I might try and help update the documentation, but it would feel a little like the blind leading the blind at this stage: although I have Fedora booting, the Mac OS X grub entries don’t work.

Logging into my desktop, though, I realised everything was bare. This was not like the Android experience at all – everything, from my username to my dot config files, needed to be set up again. I rarely change hardware, and previously I saw this as a reason to make a fresh start of things: but actually, now I value the convenience more highly.

It’s not like things are totally bad:

  • Gnome’s account settings can pull in some limited information, from Google or OwnCloud or other similar systems
  • Apps like Firefox have excellent built-in secure synchronisation that’s not a complete pain to set up
  • you can use apps like SparkleShare to make specific directories available elsewhere.

However, this really isn’t the experience I want:

  1. I should be able to use some online “Gnome Account” in the same way I can set up Enterprise Login during install
  2. That “Gnome Account” should have all my key configuration, including the details of other accounts I have linked up (maybe not the passwords, but at least the settings)
  3. If I have online storage / backup somewhere, it should offer to sync that up
  4. I should be able to sync my entire home data, not just specific bits
  5. If the two machines are on, I should be able to access one from the other – even if there’s a firewall in the way

I realise point five above is particularly moon-on-a-stick territory.

Technically speaking, a lot of the basic bits are kind of there, one way or another. Most Gnome apps use the standard dconf settings system, and in theory it’s possible to synchronise that stuff where it makes sense (this is, of course, handwaving: whether or not you want all settings exactly the same on each machine is virtually an impossible question to answer). Discovering and syncing other data shouldn’t be that hard. Remote access to another machine is definitely much harder, but the various protocols and discovery mechanisms at least exist.

Annoyingly, there doesn’t seem to be much development in this direction – not even prototypes. There are lots of nasty problems (syncing home directories is fraught with danger), and even if you were willing to buy into a simpler system to get the goodies, making it work in Gnome is probably an awful lot easier than dealing with the other apps that aren’t Gnome aware.

I’m certainly not in much of a position to start developing any of this stuff right now, but it would be interesting to at least attempt to draw out a believable architecture.  A decent 70 or 80% solution might not even be too hard to prototype, given the tools available. It would be interesting to hear from anyone else who is working on this, has worked on it, or knows of relevant projects I’ve missed!


A first look at

In my previous post about virtualenv, I took a look at a way of making python environments a little bit more generic so that they could be moved around and redeployed at ease. I mentioned as a new tool that uses a general concept of “containers” to do similar things, but more broadly. I’ve dug a bit into docker, and these are my initial thoughts. Unfortunately, it seems relatively Fedora un-friendly right now.

The first thing to look at is to examine what, exactly, a “container” is. In essence, it’s just a file system: there’s pretty much nothing special about it. I was slightly surprised by this; given the claims on the website I assumed there was something slightly more clever going on, but the only “special sauce” is the use of aufs to layer one file system upon another. So from the point of view of storage alone, there really isn’t much difference between a container and a basic virtual machine.

From the point of view of the runtime, there isn’t an awful lot of difference between a virtual machine and a container either. docker sells itself as a lightweight alternative to virtual machines, but of course there is no standard definition of a “virtual machine”. At one end of the spectrum are the minimal hardware OSen that can be used to assign different host resources, including CPU cores, to virtual machines, and those types of VM are effectively not much different to real hardware – the configuration is set on the fly, but basically it’s real metal. On the other end of the spectrum you have solutions like Xen, which make little to no use of the hardware to provide virtualisation, and instead rely on the underlying OS to provide the resources that they dish out. docker is just slightly further along the spectrum than Xen: instead of using a special guest kernel, you use the host kernel. Instead of paravirtualisation ops, you use a combination of cgroups and lxc containers. Without the direct virtualisation of hardware devices, you don’t need the various special drivers to get performance, but there are also fewer security guarantees.

There are a couple of benefits of docker touted, and I’m not totally sold on all of them. One specific claim is that containers are “hardware independent”, which is only true in a quite weak way. There is no specific hardware independence in containers that I can see; except that only runs on x86_64 hardware. If your container relies on having access to NX bit, then it seems to me you’re relying on the underlying hardware having such a feature – docker doesn’t solve that problem.

The default container file system is set up to be copy-on-write, which makes it relatively cheap diskspace-wise. Once you have a base operating system file system, the different containers running on top of it are probably going to be pretty thin layers. This is where the general Fedora un-friendliness starts, though: in order to achieve this “layering” of file systems, docker uses aufs (“Another Union File System”), and right now this is not a part of the standard kernel. It looks unlikely to get into the kernel either, as it hooks into the VFS layer in some unseemly ways, but it’s possible some other file system with similar functionality could be used in the future. Requiring a patched kernel is a pretty big turn-off for me, though.

I’m also really unsure about the whole idea of stacking file systems. Effectively, this is creating a new class of dependency between containers, ones which the tools seem relatively powerless to sort out. Using a base Ubuntu image and then stacking a few different classes of daemon over it seems reasonable; having more than three layers begins to seem unreasonable. I had assumed that docker would “flatten out” images using some hardlinking magic or something, but that doesn’t appear to be the case. So if you update that underlying container, you potentially break the containers that use it as a base – it does seem to be possible to refer to images by a specific ID, but the dockerfile FROM directive doesn’t appear to be able to take those.

The net result of using dockerfiles appears to be to take various pieces of system configuration out of the realm of SCM and into the build system. As a result, it’s a slightly odd half-way house between a Kickstart file and (say) a puppet manifest: it’s effectively being used to build an OS image like a Kickstart, but it’s got these hierarchical properties that stratify functionality into separate filesystem layers that look an awful lot like packages. Fundamentally, if all your container does it take a base and install a package, the filesystem is literally going to be that package, unpacked, and in a different file format.

The thing that particularly worries me about this stacking is memory usage – particularly since docker is supposed to be a lightweight alternative. I will preface this with the very plain words that I haven’t spent the time to measure this and am talking entirely theoretically. It would be nice to see some specific numbers, and if I get the time in the next week I will have a go at creating them.

Most operating systems spend a fair amount of time trying to be quite aggressive about memory usage, and one of the nice things about dynamic shared libraries is that they get loaded into process executable memory as a read-only mapping: that is, each shared library will only be loaded once and the contents shared across processes that use it.

There is a fundamental difference between using a slice of an existing file system – e.g., setting up a read-only bind mount – and using a new file system, like an aufs. My understanding of the latter approach is that it’s effectively generating new inodes, which would mean that libraries that are loaded through such a file system would not benefit from that memory mapping process.

My expectation, then, is that running a variety of different containers is going to be more memory intensive than a standard system. If the base containers are relatively light, then the amount of copying will be somewhat limited – the usual libraries like libc and friends – but noticeable. If the base container is quite fat, but has many minor variations, then I expect the memory usage to be much heavier than the equivalent.

This is a similar problem to the “real” virtual machine world, and there are solutions. For virtual machines, the same-page mapping subsystem (KSM) does an admirable job in figuring out which sections of a VM’s memory are shared between instances, and evicting copies from RAM. At a cost of doing more compute work, it does a better job that the dynamic loader: shared copies of data can be shared too, not just binaries.  This can make virtual machines very cheap to run (although, if suddenly the memory stops being shareable, memory requirements can blow up very quickly indeed!). I’m not sure this same machinery is applicable to docker containers, though, since KSM relies on advisory flagging of pages by applications – and there is no application in the docker system which owns all those pages in the same way (for example) qemu would do.

So, enough with the critical analysis. For all that, I’m still quite interested in the container approach that docker is taking. I think some of the choices – especially the idea about layering – are poor, and it would be really nice to see them implement systemd’s idea of containers (or at least, some of those ideas – a lot of them should be quite uncontroversial). For now, though, I think I will keep watching rather than doing much active: systemd’s approach is a better fit for me, I like the additional features like container socket activation, and I like that I don’t need a patched kernel to run it. It would be amazing to merge the two systems, or at least make them subset-compatible, and I might look into tools for doing that. Layering file systems, for example, is only really of interest if you care a lot about disk space, and disk space is pretty cheap. Converting layered containers into systemd’able containers should be straightforward, and potentially interesting.

packaging a virtualenv: really not relocatable

My irregular readers will notice I haven’t blogged in ages. For the most part, I’ve been putting that effort into writing a book – more about this next week – hopefully back to normal service now though.

Recently I’ve been trying to bring an app running on a somewhat-old Python stack slightly more up-to-date. When this app was developed, the state of the art in terms of best practice was to use operating system packaging – RPM, in this case – as the means by which the application and its various attendant libraries would be deployed. This is a relatively rare mode of deployment even though it works fantastically well, because many developers are not happy maintaining the packaging-level skills required to maintain the system. From what I read the Mozilla systems administrators deploy their applications using this system.

For various reasons, I needed to bring up an updated stack pretty quickly, and spending the time updating the various package specifications wasn’t really an option. It didn’t need to be production rock-solid, but it needed to be deployable on our current infrastructure. The approach that I took was to build a packaged virtualenv Python environment: I’ve read online about other people who had tried it to relative success, although there are not many particularly explicit guides. So, I thought I would share my experiences.

The TL;DR version of this is that it was actually a relatively successful experiment: relying on pip to grab the various dependencies of the application meant that I could reliably build a strongly-versioned environment, and packaging the entire environment as a single unit reduced the amount of devops noodling. There is a significant downside: it’s a pretty severe mis-use of virtualenv, and it requires some relatively decent understanding of the operating system to get past the various issues.

Developing the package

As I have a Fedora background, I’m not really happy slapping together packages in hacky ways. One of the things I’m definitely not happy doing is building stuff as root: it hides errors, and there’s pretty much no good reason to do anything as root these days.

In order to build a virtualenv you have to specify the directory in which it gets built, and without additional hacks that’s not going to be the directory to which it installs. So, the “no root build” thing immediately implies making the virtualenv relocatable.

The web page for virtualenv currently has this sage warning:

“The --relocatable option currently has a number of issues, and is not guaranteed to work in all circumstances. It is possible that the option will be deprecated in a future version of virtualenv.”

Wise words indeed. There are a tonne of problems moving a virtualenv. Encoding the file paths directly into files is an obvious problem, and virtualenv makes a valiant attempt at fixing up things like executable shebangs. It doesn’t catch everything, so some stuff has to be rewritten manually (by which I mean, as part of the RPM build process – obviously not doing it by hand).

Worse still, it actively mangles files. Consider one of pillow’s binaries, whose opening lines become:

#!/usr/bin/env python2.7

import os; activate_this=os.path.join(os.path.dirname(⏎
os.path.realpath(__file__)), '');⏎
execfile(activate_this, dict(__file__=activate_this));⏎
del os, activate_this

from __future__ import print_function

Unfortunately this is just syntactically invalid python – future imports have to come first. Again, it’s fixable, but it’s manual clean-up work post-facto.

What to do about native libraries

Attempting to use python libraries having native portions, be it bindings or otherwise, is also an interesting problem. To begin with, you have to assume a couple of things: that native code will end up in the package, and not all of it will be cleanly built. The obvious example of both those rules is that the system binary python is copied in.

This causes problems all over the shop. RPM will complain, for example, that the checksum of the binaries don’t match what it was expecting: this is because it reads the checksum from the binary directly rather than calculate it at package time, and prelink actually alters the binary contents (this happens after the RPM content is installed, but RPM ignores those changes for the purposes of its package verification).

Another example of native content not playing well with being packaged is that binaries will quite often have an rpath encoded into them. This is used when installing into non-standard locations, so that libraries can be easily found without having to add each custom location into the link loader search path. However, RPM rightly objects to them. It’s possible to override RPM’s checks, but that’s pretty naive. Keeping rpaths means bizarre bugs turn up when the paths actually exist (e.g., installing the environment package on the development machine building the package – which is quite plausible, given the environment package may end up being a build-time dependency of another).

Thankfully, binaries can usually be adjusted after the fact for both these things; it’s possible to remove the rpaths encoded into a binary, and undo the changes prelink.

In the end, I actually made a slightly hacky choice here too: I decided that the virtualenv would allow system packages. This was the old default, but is no longer because it stops the built environments being essentially self-contained. This allowed me to build certain parts of the python stack as regular RPMs (for example, the MySQL connector library) and have that be available within the virtualenv. This is only possible if there is going to be one version of python available on the system (unless you build a separate stack on a separate path – always possible), and takes away many of the binary nasties, since the binary compilation process is then under the control of RPM (which tends to set different compiler flags and other things).

The obvious downside to doing that is that system packages are already fulfilled when you come to build the virtualenv, meaning that the virtualenv would not be complete. If that’s the intention that’s ok, but that’s not always what’s wanted. I resorted to another hack: building the virtualenv without system packages, and then removing the no-global-site-packages flag manually. This means you have to feed pip a subset of the real requirements list, leaving out those things that would be installed globally, but that seemed to work out reasonably well for me.

The rough scripts that I used, then, were these. First, the spec file for the environment itself:

%define        ENVNAME  whatever
Source:        $RPM_SOURCE_DIR/pyenv-%{ENVNAME}.tgz
BuildRoot:     %{_tmppath}/%{buildprefix}-buildroot
Provides:      %{name}
Requires:      /usr/bin/python2.7
BuildRequires: chrpath prelink

A packaged virtualenv.

%setup -q -n %{name}

mkdir -p $RPM_BUILD_ROOT%{prefix}
mv $RPM_BUILD_DIR/%{name}/* $RPM_BUILD_ROOT%{prefix}

# remove some things
rm -f $RPM_BUILD_ROOT/%{prefix}/*.spec

# undo prelinking
find $RPM_BUILD_ROOT/opt/pyenv/%{ENVNAME}/bin/ -type f -perm /u+x,g+x -exec /usr/sbin/prelink -u {} \;
# remove rpath from build
chrpath -d $RPM_BUILD_ROOT/opt/pyenv/%{ENVNAME}/bin/uwsgi
# re-point the lib64 symlink - not needed on newer virtualenv
rm $RPM_BUILD_ROOT/opt/pyenv/%{ENVNAME}/lib64
ln -sf /opt/pyenv/%{ENVNAME}/lib $RPM_BUILD_ROOT/opt/pyenv/%{ENVNAME}/lib64



(Standard files like name and version are missing – using the default spec skeleton fills in the missing bits). It’s not totally obvious from this, but I actually ended up building the virtualenv first and using that effectively as the source package:

virtualenv --distribute $(VENV_PATH)
. $(VENV_PATH)/bin/activate && pip install -r requirements/production.txt
virtualenv --relocatable $(VENV_PATH)
find $(VENV_PATH) -name \*py[co] -exec rm {} \;
find $(VENV_PATH) -name no-global-site-packages.txt -exec rm {} \;
sed -i "s|`readlink -f $(VENV_ROOT)`||g" $(VENV_PATH)/bin/*
cp ./conf/pyenv-$(VENV_NAME).spec $(VENV_ROOT)
tar -C ./build/ -cz pyenv-$(VENV_NAME) > $(VENV_ROOT).tgz
rm -rf $(VENV_ROOT)

 Improving on this idea

There’s a lot to like about this kind of system. I’ve ended up at a point where I have a somewhat bare-bones system python packaged, with a few extras, and then some almost-complete virtualenv environments alongside to provide the bulk of dependencies. The various system and web applications are packaged depending on both the environment and the run-time. The environments tend not to change particularly quickly, so although they’re large RPMs they’re rebuilt infrequently. I consider it a better solution than, say, using a chef/puppet or other scripted system to create an environment on production servers, largely because it means all the development tools stay on the build systems, and you can rely on the package system to ensure the thing has been properly deployed.

However, it’s still a long, long way from being perfect. There are a few too many hacks in the process for me to be really happy with it, although most of those are largely unavoidable one way or another.

I also don’t like building the environment as a tarball first. An improvement would be to move pretty much everything into the RPM specfile, and literally just have the application to be deployed (or, more specifically, its requirements list) as the source code. I investigated this briefly and to be honest, the RPM environment doesn’t play wonderfully with the stuff virtualenv does, but again these are probably all surmountable problems. It would then impose the more standard CFLAGS et al from the RPM environment, but I don’t know that it would end up removing too many of the other hacks.

The future

I’m not going to make any claims about this being a “one true way” or some such – it clearly isn’t, and for me, the native RPM approach is still measurably better. Yes, it is slightly more maintenance, but for the most part that’s just the cost of doing things right.

What is interesting is that this kind of approach seems to be the way a number of other systems are going. virtualenv has been so successful that it’s now effectively a standard piece of python, and rightly so – it’s an incredible tool. Notably, pyenv (the new tool) does not have the relocatable option available.

I’m slightly excited about the “container engine” system as well. I haven’t actually tried this yet, so won’t speak about it in too concrete terms, but my understanding is that a container is basically a filesystem that can be overlaid onto a system image in a jailed environment (BSD readers should note I’m using “jail” in the general sense of the word – sorry!). It should be noted that systemd has very similar capability in nspawn too, albeit less specialist. Building a container as opposed to an RPM is slightly less satisfying: being able to quickly rebuild small select portions of a system is great for agile development, and having to spin large chunks of data to deploy into development is less ideal, but it may well be the benefits outweigh the costs.

A (fond) farewell to Zend Framework

I’ve been a Zend Framework user for a while. I’ve been using PHP long enough to appreciate the benefits of a good framework, and developed a number of sophisticated applications using ZF, to have grown a certain fondness for it. Although it has a reputation for being difficult to get into, being slow and being overly complicated – not undeserved accusations, if we’re being honest – there is something quite appealing about it. Well, was, for me at least. ZF 1.11 looks like the last version of the framework I will be using.

Why? The simple answer is ZF 2.0. Having been busily built over the past couple of years one way or another, a number of betas have been released and it looks likely to me that an initial release is a few months away. At this point I need to make a decision about my future use of the framework, and I don’t particularly like what I see.

Let’s be quite honest about one thing up-front: I cannot claim to have done any substantial amount of work in ZF 2.0. The criticisms within are all personal opinion based on little more than the most itinerant tinkering.

That said, I actually don’t feel like much of what I’m about to say is unfair, for one simple reason: I have tried to like ZF 2.0. There are of course other PHP frameworks, and I don’t really need to name them, and many of them are initially much nicer to get started on than ZF. Despite all that, I got quite happy with ZF1, and indeed approached ZF2 with the idea that it would take a similar amount of effort to learn to like it. I have attempted to apply that effort. I have failed.

Much of what I think is wrong with ZF2 you can quite obviously see in the ZendSkeleton example application. Now, of course, the example applications for most things are pretty poor: every JS framework has a To-do app, and things are generally chosen to show off the best features of the framework in their most flattering light. That’s actually the first thing that hits me about the skeleton application: it’s deathly, deathly dull, but there’s a pile of code needed to get you to that point. The sheer amount of boilerplate needed to get much further than ‘Hello World’ is incredible, and of truly Java-like proportions.

Generally, I like my frameworks opinionated. I like having a culture, or an ethos, which come through as a set of guiding principles for the developers of both applications using the framework, and the framework itself. And ZF certainly is and was opinionated. I suppose at this point, I find that my opinions differ with theirs too much, and that’s an issue.

The first opinion I would cite is the use of dependency injection. Now, I get DI, truly I do. I even like the idea of it. I can see how it would be useful, and how it could add a heap of value to a project. But there is “useful feature” and then there is “koolaid”, and DI in ZF2 is alarmingly close to the latter. As a case in point, just take a peek at the module config for the skeleton app.

The comment at the top of the file first sent shivers down my spine – “Injecting the plugin broker for controller plugins into the action controller for use by all controllers that extend it” – it’s the kind of enterprise buzzword bingo that again, sorry people, sounds just like a Java app.

And as you progress through what is supposed to be just a bit of config for a simple application, wading past the router config and various pieces of view configuration, you’ll see the thing which just turned me right off – ‘Zend\View\Helper\Doctype’. Seriously? A fragment of view code to manage the doctype? As if this is something you can just change at runtime? “Oh yeah, we built it in HTML5, but I can just downgrade to HTML4 by switch….” sorry, no. Doctype is totally fundamental to the templates you’ve written. This is so far from application config it’s not funny.

Other stuff I can’t abide: the default file tree. Did you think application/controllers/IndexController.php was bad enough? Now you have module/Application/src/Application/Controller/IndexController.php. I do get the reason for this, but again enforced modularisation – ZF1 supported modules too without forcing it.

I know how observers might respond to this: it’s a skeleton app. It’s supposed to be showing a set of best practice, you can cut corners and make things simpler. Except, this isn’t true: there’s already a whole load of corners cut. Just look in the layout; there’s a pile of code at the top of the template. Isn’t the view supposed to be where the code lives?! I would have most of that crap in my Bootstrap.php as-was, I can’t believe people are advocating throwing that in the layout template now (and I’m sure they’re not). But there it is, cluttering up the layout, when it really should be refactored somewhere else.

This is the issue. The skeleton app does a whole heap of things just to do nothing except chuck Twitter bootstrap on screen. I am, of course, willing to be shown how all this up-front investment will pay off in the end – but right now, I really do not see it. The more I look, the more I see things which will just require more and more code to get working – a constant investment throughout the life of a project, without any obvious pay-back for it later. As a rule of thumb, whenever I’ve used a framework before, the skeleton always looks pretty good, but a production app gets entirely more complex and hairy. Things don’t improve, generally, at best they stay as-bad. I would worry that a ZF2 app would just explode into a sea of classes entirely unnavigable by a junior programmer, held together by a DI system so abstract they would have little chance of properly comprehending it.

This is really sad. ZF1 had a number of shortcomings which I thought ZF2 looked on track to tackle – and, in all probability, has tackled. The REST controllers in ZF1 were complete bobbins, and ZF2 looks like it has that right. The Db layer in ZF1 was actually quite good, but ZF2 looks to have improved on it. PHP namespaces are of course ugly as sin and ZF2 embraces them, but they make sense and I could potentially learn to love them. But my gosh, just look at the quickstart. Remember, this is the “get up and running as fast as possible” guide for people who already know the language and just want to get cracking.

What is bad about it? Well, 12.2.2 is the start of the “you’ve already installed it – let’s get coding” section. First item on the todo list? “Create your module”. This involves downloaded the skeleton, copying bits over, and being told all is well. 12.2.3, update the class map for the autoloader, using the correct namespace, ensuring configuration is enabled and being lenient with the autoloading (let’s both you and me pretend we understood what on earth this section was trying to achieve).

12.2.4, create a controller. Oh my god, I don’t want to know what Zend\Stdlib\Dispatchable is there for, or why I might pick a REST controller because the quick start doesn’t cover REST. But no fear, we have a basic controller, it looks like this:


use Zend\Mvc\Controller\ActionController,

class HelloController extends ActionController
   public function worldAction()
        $message = $this->getRequest()->query()->get('message', 'foo');
        return new ViewModel(array('message' => $message));

Unfortunately this reminds me again – I hate to use the J-word – of all the geek Java jokes. Boilerplate object-this and method-other-thing-another-method-that().

I so want to be interested in ZF2, but it’s about as far up the corporate enterprise architecture-astronaut ladder as I have ever seen PHP climb. And honestly, if I wanted to program Java, I’d use Java. And then I’d download Play or Scala and actually enjoy it. But for PHP, no. So, adieu, ZF. It has been nice knowing you.

“Dart” out in the open – what’s it all about?

This morning was the big “Dart language” unveil – the Dart websites are up at and And already many seasoned Javascripters have the knives out. I’m surprised for a couple of reasons: the first, this isn’t quite as big a deal as many people thought it would be (me included), both in terms of the scope of the system and the distance to Javascript. Second, the system isn’t quite as finished as many predicted: this isn’t going to be usable for a little while.

That all aside, let’s look at the highlights:

It’s classicly classful

Javascript has a prototypical object system. That means that instead of having classes that define what objects should look like, you simply create an object, make it look how you want it to look, and then use that as a template for all future members of that “class”. This is deeply confusing to many people who have come from languages like C#, Java, C++, Python (etc. – i.e., practically all of them) where you do things the classical way. The Dart designers have more or less thrown out the prototypical system, and a standard class-based system is available, with some degree of optional typing.

And in fact, it seems more or less mandatory: they’ve used main() as the code entry point again, and like in Java, for anything non-trivial you’re basically going to have to write a class which implements that function.

I’m not yet sure whether this is a great thing or not – mostly, I’m ambivalent – but lots and lots of people have written their own “quacks like a class” system on top of Javascript, including Doug Crockford. Javascript is flexible enough to represent this, so the Dart-to-Javascript compilation bits will work, but obviously it’s not going to interact well with Javascript libraries that take a different view of classes or inheritance. This is probably not a problem; Perl has a number of different ways of implementing objects and there doesn’t generally seem to be much trouble with it.

Wider standard library set

Javascript has been let down by its standard library set in many ways. First, there really aren’t many data types available: you have an associative array, you have a non-associative array, and that’s about it. Objects are more or less associative arrays. But also, there aren’t other APIs to do useful things in the primary heartland of Javascript, the browser. The answer to all this, of course, has been the rather well designed Javascript library projects that have sprung into being: the JQuery, Mootools and YUIs of this world. And there are many of them, and the competition is fierce, and the end results are actually very good.

Dart goes a slightly different way with this. The library sets that come with Dart do a lot more than Javascript is capable of. There are lots more basic types, and many basic behaviours (interfaces) that describe in what context you can use such data – for example, any type that implements ‘Iterable’ can be used in a loop. It’s pretty great that this is all standard. Sadly, the DOM library makes a re-appearance, which is a bit of a shame because it’s literally one of the suckiest APIs ever invented, but on the flip side it does mean that the likes of JQuery could be ported to Dart easily.

Sugar for asynchronicity

Javascript, particularly when used in the browser, is deeply asynchronous. Unfortunately, the language itself doesn’t have an awful lot of support for that. You can pass functions around as first-class objects, so a lot of APIs are designed with function call-backs to execute “later on”. This leads to a kind of “macaroni code” where roughly procedural code (“Do this, then do that, then do this other thing”) is broken up over many functions just so it can be passed around like this. Dart gives the programmer a little bit of help here by implementing Promises.

In Dart, the Promise is an interface which looks an awful lot like a thread in many ways. The key sugar here is that the language still has the callbacks, but you can chain them together with then() instead of embedding them each within itself. You can also check on how they’re doing, cancel them if you like, and other stuff – again, nothing that Javascript doesn’t have, but slightly more elegant. Combined with the class system, it also means the end of ‘var $this = this’ and other such scoping hacks.

Message passing

This is probably more important than the Promises interface. Dart has message passing built-in, like many suspected. And, it looks nice and simple: you have ports, and you can either receive messages from them or send messages to them. The receivers are basically event-driven in the same way a click handler would be. Seeing the value here is difficult in some ways: it will be interesting to see how the balance is struck, because if you are designing a class you could either make an API which creates Promises, or send/receive messages – the net effect is roughly the same. You probably don’t want to implement both, but which system you use is up to you. The message passing interface is slightly more decoupled; but it’s probably easier to abuse in the longer term.

It’s all sugar

I think this is the thing which surprises me most about Dart: it’s actually pretty close to Coffeescript, but with a more Javscript-like syntax. And that’s why I can see this being successful: you can turn it into standard Javascript, but it gives us a lot of the bells and whistles that programmers have been crying out for. Classes have getters and setters like C#, strings can have variables that interpolate within them, you can write really lightweight functions in a new => syntax, and classes can have built-in factories – just to name a few of the highlights.

There are some extras, like the ability to reference needed CSS, which point to a slightly grander future where Dart scripts are rolled up with their related resources into something more easily distributable. And maybe this is the point: the unveiling of Dart was not really a beginning itself, but the beginning of a beginning. They’ve designed the language to attempt to grow with your application: you can start small and simple, but as your application grows you can add more to it (like types and interfaces) to make it more structured and, hopefully, safer to run. And in the same sense, Dart itself is going to grow over time as well.


Is package management failing Fedora users?

(For those looking for an rpm rant, sorry, this isn’t it….!)

Currently there’s a ticket in front of FESCo asking whether or not alternative dependency solvers should be allowed in Fedora’s default install. For those who don’t know, the dependency solver is the algorithm which picks the set of packages to install/remove when a user requests something. So, for example, if the user asks for Firefox to be installed, the “depsolver” is the thing which figures out which other packages Firefox needs in order to work. On occasion, there is more than one possible solution – an obvious example often being language packs; applications usually need at least one language installed, but they don’t care which.

I don’t particularly have much skin in this particular game; but what I would note is that I find it particularly bizarre that this task is delegated to an algorithm. What we’re saying, basically, is that the configuration of a given installation is chosen by a bit of software. So long as the various package requirements – which could be library versions, files, or something entirely synthetic – are all met, the configuration is “valid”. Of course, that doesn’t necessarily mean it works – it may be totally untested by anyone else, and things get particularly grizzly if you’re doing something “fun”. Such fun includes:

  • deploying “multi-arch” packages. Maybe you want a 32-bit browser plugin on your 64-bit PC, for example;
  • installing third-party packages. Maybe it’s RPM Fusion, maybe it’s an ISV – but where-ever it’s from, it’s another set of variables the depsolvers looks at;
  • installing your own packages. See above.

The package management system doesn’t have a concept of “OS” and “other stuff”. Being able to override such a concept would be a feature; not having it is not a feature however.

Now, fans of package management frequently tout the many benefits, and they are indeed multiple. It makes it easy to install new software and be reasonably sure it works (it may need a lot of configuration, though). Splitting stuff up into a thousand different bits makes security updates more straightforward (at least, in theory – see later). But in general, to conflate all these issues is a bit of a mistake: there are other forms of installation system which provide these benefits as well.

So, what’s wrong with this? We’ve already seen that the choice of depsolver can potentially make or break the system, or at least lead to configurations which were not intended by the packagers, but to some extent that could be solved by tightening the specification/package dependencies, so that the “right choice” is obvious and algorithm-independent. But, there are other issues.

It’s difficult to estimate the number of Fedora users, but the statistics wiki page makes a reasonable effort. And looking at that, we can see that about 28 million installs of almost 34 million (that are connecting for software updates) are currently using unsupported releases of Fedora. That’s over 80% of installs using a release which is no longer supported.

This of course has security implications, because these users are no longer getting security updates. No matter how fancy the package management, these people are all on their own. And unfortunately, the package management tools are not much use here: effectively, unless you use the installer in one of its guises, the procedure is difficult and potentially error prone.

You’re also out of luck with third-party repos: the package manager doesn’t insulate them from each other, so mixing is frowned upon. It may work, it may not. You may be able to upgrade, you may not. It may alter core functionality like your video driver, and you might be able to downgrade if it failed, but let’s hope it didn’t manually fiddle with things.

In the meantime, we’re also failing to deal adequately with many types of software. The Firefox update process causes enough problems with the current setup; Google’s Chromium on the other hand appears to be almost entirely impervious to being packaged in a Fedora-acceptable way. Web applications also don’t work well; Javascript libraries don’t fit well/at all into the concept of libraries a la rpm, so there’s loads of duplication.

There’s probably an awful lot more that can be written on this topic, and of course package management right now, for the most part, works pretty well. But I worry that it’s a concept which has pretty much had its day.

Speculation on Google’s “Dart”

Just yesterday people jumped on the biographies and abstract for a talk at goto: the Keynote is Google’s first public information on Dart, a “structured programming language for the world-wide web”. Beyond knowing a couple of the engineers involved – which allows a certain amount of inference to take place – there’s also some speculation that Dart is what this “Future of Javascript” email referred to as “Dash” (this seems entirely possible: a dash language already exists; Google already used ‘Dart’ for an advertising product but have since stopped using that name, potentially to make way for the language).

I thought it would be interesting to have a look at some of the details of this new language. One thing seems quite certain: Google’s Javascript engine, V8, is going to target this, because it’s going to target client-side application programming to begin with. V8 is, of course, very popular – it’s in Chrome, it’s in Node.js, it’s going to be put in Qt. However, it hasn’t really been a brilliantly standalone project (witness the problems getting Chromium into Fedora, as an example) and the addition of Dart will almost certainly make this worse.

So, what else do we know?

Compiles to Javascript

It seems likely that the language will, at least in a proper subset, compile into Javascript – a lot like Coffeescript does. Personally, I cannot stand Coffeescript for the same reasons I really don’t like python, but there is some obvious win to this approach: you get backwards compatibility with existing systems and, usually, a method of interacting with existing code and libraries.

I suppose the first question is, how different to Javascript will it be? It will almost certainly be object-oriented, but that need not imply prototypical inheritance – it could be the Javascript compiler will do some fancy trick with object to make things appear more classical. Coffee does this to a large extent too, and I think we’ll see a similar approach. I doubt much of Coffee’s syntax would be copied – it’s almost Perl-like in its terseness sometimes – but I think there will be a similar approach to the object model.

There will be other differences. Javascript is relatively typeless, I suspect Dart will have types of some sort at least optionally. The scoping rules will probably be a bit different as well – the “let” keyword has never caught on wildly, but some level of block scoping (as an example) would be an obvious improvement.

Not just a language

I think it’s relatively clear from the “Dash” discussion that this isn’t just going to be a language: templating and possibly even MVC will be available alongside, some how. I expect to see some interesting things here, actually – there might not be much impact on the language (although a way of embedding HTML templates might be handled specially) but I think it will be closely aligned to these key tools. The Javascript world has been doing some interesting stuff – see Backbone.js and Knockout.js as two obvious examples – but it will be really interesting to see how much “platform” is put into Dart.

There is a worry here, of course, that it’s too restrictive. Knockout is actually a great example: it’s MVVM architecture, not MVC, and for a lot of jobs I’ve actually been really, really impressed with it. It’s simple, straightforward, but most of all productive. It would be a shame if you can’t do something similar in Dart, but I would bet you can. Binding data onto a web page is so fundamental, so basic, that I really think there will be some interesting stuff there.

Binary Dart?

I’m not really sure about this, but I’ll chuck it out there anyway: clearly, writing Dart in a text editor is going to be fine. However, I also expect that there would be alternative delivery mechanisms. Right now, people use tools like Closure to “compile” Javascript into a more compact representation. Clearly, if you’re starting with a new language, you could specify a binary format right from the start. This would also sit beside NaCl/Pepper quite nicely, and allow multiple resources to be included into a project without having to have multiple file downloads into the browser.

Google are going to be focussed on deployment of large, highly-interactive apps, I think – so although the small page enhancement stuff would still be on the table, really I think Dart is going to be about writing GMail and Google Docs. In that context, being able to wrap up everything into a nice deployment package makes a whole heap of sense.

A month to wait?

Sadly, I don’t think we’re going to know too much more before goto;. I had a look around the V8 source, there aren’t really many clues in there to what’s coming. If they’re offering a compile-to-Javascript option, that might be the only thing released at first – so Dart would effectively be a standalone compiler only to begin with, growing features to target specific engines later on.

The quality of Fedora releases

Scott James Remnant blogged his ideas about how to improve the quality of Ubuntu releases recently, triggering some discussion at LWN about the topic. I offered some opinions about Ubuntu which are not terribly interesting because I don’t get to use it often; however, I did also write about Fedora based on the last couple months’ experience of Fedora 15 & 16.

Before I get to that, at roughly the same time, Doug Ledford was posting his thoughts about the “critical path” process – essentially, saying it was broken. I’m pretty sure he will find vociferous agreement with his views, based on previous feedback, but not (alas) from me.

I made a number of claims about Fedora in the LWN comments, that it is essentially unusable for anyone non-expert. I stand by this: if you use Fedora, and care about things continuing to work on an ongoing basis, you have to be entirely au fait with:

  • upgrading your distribution every six months;
  • rolling back an update (including config) when (not if) things break;
  • knowing how to distinguish which part of the stack is broken (and oh boy, this isn’t easy).

People complain about Windows “rotting” over time and getting slower, and slower. Fedora is worse than this: it works great until you get an update which blows out something critical. At that point it stops working: maybe you stop receiving updates, maybe you can’t boot it, maybe you can’t log in. I haven’t had a release yet, that I can recall, that something crucial wasn’t broken.

Of course, in Fedora, we have the critpath process, which is supposed to stop this kind of thing. And this is what Doug is complaining about, because the process which puts roadblocks in the way of updates to crucial packages naturally gets in the way of “fixes”.

This is where I depart with Doug, I suspect. I have sympathy for his situation, particularly in F16. But the point of critpath is that it does cause pain.

The issue with critpath isn’t that it gets in the way; it’s that it highlights release problems. When major issues turn up in critical packages and get through to release, it hurts – it cannot be fixed quickly. And here’s my point of disagreement: we shouldn’t allow more direct fix releases just to avoid the pain, we should address the root cause of the problem – the release of bad code.

(Interlude: as a point of order, it should probably be made clear that the specific issue Doug is facing is a lack of karma on a specific release branch with a package which, although critical, is not necessarily widely used. That’s an obvious bummer, and I’m not sure is terribly instructive about the process as a whole because of that).

There have been a variety of other solutions proposed. AutoQA is something I have a huge amount of time for, and in particular would help reduce the incidence of obvious brown paper bag bugs. It’s not a solution itself, though. Equally, there will always be stuff which evades testing – hardware support in device drivers being an obvious case in point.

I am extremely jealous, though, of the quality Debian achieves, and I say this as a Fedora user. It’s stable, it’s easily upgradable from release to release, and generally the use:surprise ratio is reassuringly high. To a large extent, I think they illustrate that the specific processes don’t really matter a huge amount: what matters is actually getting to a point where maintainers don’t release bad packages often. And to me, that is the point of critpath: to encourage packages of sufficient quality that the pain is generally avoided. Simply short-cutting the process doesn’t encourage that; it just encourages more of the same in Fedora.

Who can program?

Over the past couple of weeks, I’ve been pondering the above question for a number of different reasons. For people who really study programming, like I attempt to, there are a number of claims/myths/legends/tales that are commonly held about people who cut code for a living, such as:

  1. some programmers, the “alphas”, are as much as ten times more efficient than the common programmer;
  2. there are people who “get” computers, and those who don’t. Cooper splits these into “humans” and “homo logicus”. Those who don’t grok computers are destined to never be able to program;
  3. there are people who are paid to cut code, and who simply can’t – they rely on auto-complete IDEs, cut’n’paste library code, etc.;

For the purposes of this post, I’ll separate between these different concepts: the “goats” (people who cannot code, at all), the “sheep” (people who code, perhaps professionally, but poorly) and the alphas. Sheep and alphas are collectively referred to as coders.

Saeed Dehnadi’s PhD homepage cropped up recently, on Hacker News I think, and mentions some tests which have had varying degrees of success in differentiating between goats and coders. Somewhat surprisingly, it’s claimed that it’s possible to administer tests to people before they have been taught to code and yet still determine whether or not they will be able to code. The tests are disturbingly simplistic, but although they involve (pseudo-)code, they’re actually designed to determine the mental models people apply to the problem, and in particularly whether people apply a consistent model.

I have to say, I remain a little bit sceptical about all of that. It reminds me of a question a lecturer once asked our class, while working on set theory: “Why to people refer to ‘∀’ as ‘upside down A’, and refer to ‘∃’ as ‘backwards E’? They’re both simply rotated π radians”. I remember thinking to myself that he obviously had little clue how the human mind approached novelty – and in particular attempt to label things with a communicable tag. “`A’ transformed with a π radians rotation about the centre point” not having quite the same ring about it. But maybe there was a point in there, somewhere, about finding single consistent models that can be re-applied.

It’s really tempting to think about this in terms of tests. This is, after all, one of the reasons vendor certification programmes came into life: to reassure employers of whatever description that the person they’re hiring to do a task with a specific product really is able to do what they say they are able to. And it does work, kind of, after a fashion. If what you need is within the narrow scope of the studies of the certification, you can be generally assured that the person does at least know the theory. However, for programming, this is a bit different – frankly, there is no such thing as “narrow scope” when you’re talking about cutting code. Some people, like Paul Graham, go as far as to say questions like “how do you pick good programmers if you’re not a programmer?” are basically unanswerable (mistake 6 in his classic list of 18 mistakes that start-ups make).

It’s also difficult to talk about how you can tell the difference between sheep and alphas (let’s pretend, for a moment, that there is no spectrum in between there – that’s probably not true, but it might be a somewhat valid simplification). How many people read Reg Braithwaite’s discussion of using FizzBuzz during interviews and didn’t recognise the picture he paints? Let me repeat here his main thesis:

“199 out of 200 applicants for every programming job can’t write code at all. I repeat: they can’t write any code whatsoever.”

He refers back to another classic, Joel Spolsky’s thoughts on the hiring process from back in 2005. So, what are all these people doing applying for jobs that they are effectively incapable of doing, and how many of these people actually end up being hired and contributing to the sheep pool? It’s difficult to know exactly why they are applying, but part of the reason has got to be the modern tools available to programmers: both the editing environments, the IDEs, and the documentation and programming resources available. Some coders will have no idea about the types of tool I’m talking about, having never used the likes of IntelliSense and visual designers.

Let me give you a clue. Take a long hard look at this abomination. Just to be really clear about what’s going on there, they have a tool which can effectively write code based on semantic guessing – the words you’ve written, the types of the variables involved, the context of the code, that kind of thing. This is like a builder using a brick-laying machine to build a house without the need of any intervening thought about such trifles as “how strong should this wall be?”, “am I putting this wall in the right place?” and even “what if I need to later put a door in here?”. Simplifying the coding process is an admirable goal, and in fact has been an ongoing process since we wrote the first assemblers to translate mnemonic instructions into machine code. However, the ideal has always been to raise the level of abstraction to the height at which it makes sense to solve the problem at hand. You don’t want to write an entire database system in assembler, it’s not really a high level enough language. But you may want to dip into assembler in specific parts. Sometimes, it will hide the details of what’s going on underneath from the programmer, and occasionally that will annoy the programmer. In general, you do not what to be writing large pieces of code where you actually have no idea of what’s going on – an alpha would never, ever stand for that; a sheep, on the other hand, would.

Jeff Atwood has another collection of thoughts on his 2007 post about coders who can’t code. The conclusion he reaches is relatively natural, based on his references and the above: you ask people to actually write some code. Amazingly, surprisingly, gob-smackingly, this still isn’t happening – even today. I could name, but won’t, places which are hiring programmers based on not much else but their claimed experiences. I know people who’ve been through such processes, and have seen such myself. Do you need to ask a question of each candidate, make them do something original? No, of course not – you could even simply ask for a sample of code and take it on trust that they wrote it. My experience on the hiring end is that it’s actually quite easy to tell whether someone wrote a piece of code or not, and the most telling piece of information is not the code itself, but the problem it solves – seeing the type of code that a candidate thought was appropriate to display actually says an awful lot about their tastes and sensibilities.

If I was doing this again right now, what would I do? Probably about the same as I did last time: ask people who turn up to do some simple tests. It’s shocking how often people with an otherwise interesting CV just totally bomb even the most simple request, but it’s nothing much more than a filter. Trying to determine higher-level skills is actually fundamentally more difficult, because the more skill you’re attempting to detect the more parochial your test necessarily becomes, to the point where you’re filtering out people who simply don’t have the exact same background / knowledge set as yourself. Much more important is the capacity to learn and apply problem-solving techniques – part of me thinks that asking them to write code in a language they’ve never seen before might be an interesting test, but it would be fantastically difficult to pitch it at the right level.

I’m going to end with a link to a discussion on StackExchange about how to detect a passionate programmer. I’m not sure I agree with much more than about 50% of it, but there are a lot of ideas in there for predictors of programming expertise. Interestingly, there are even some companies out there who claim to be able to help out in the hiring process to clear out goats and sheep. I have a lot of sympathy for people like Eric Smith who are rolling their own tests to attempt to judge skill. I have to say, though: it really shouldn’t be this hard.