On Hiring A-Players

“Steve Jobs has a saying that A players hire A players; B players hire C players; and C players hire D players. It doesn’t take long to get to Z players. This trickle-down effect causes bozo explosions in companies.” ― Guy Kawasaki

A few times recently I’ve bumped into what I call the “A-Player Theory”. This is a close relative of the “10x Engineer Theory”, and in its usual formulation states that only the best want to hire their equals or betters: everyone else, for whatever reason, hires down. If only you were brave enough to hire people better than you, you’d be creating great teams in no time!

Like a lot of ideas in the tech arena, it feels like this idea has come from the world of elite sports, and this is no bad thing. When I want to explore ideas about peak performance, it’s natural to look to different types of people and try to apply their ideas to my own. However, I don’t think that that this theory is actually that helpful, and it’s simple to explain why.

First, I think there is a simple problem with the formulation in that hiring is a two-way street: not only does the hirer need to want to find the best people, but the best people will need to be attracted to the hirer. I think this second effect is actually the more important: the quality of the talent available to you, as a hirer, will be directly proportional to the quality of your culture. This is a difficult statement to support in this post, and is something I will explore further in the future, but I do believe that this is the case.

But second, and more importantly, I just don’t think the theory holds in elite sports. Now granted, things are rather different in American elite team sports: for example, team-building is crucial in American Football, but processes like “the draft” are  deliberately designed to ensure that C-teams hire A-players (at least in the beginning), so that some degree of competitive parity is ensured.

I’d like to give some examples from the world of soccer, instead. I’ve chosen this sport for a few reasons – partly, I’m very familiar with it, but mainly because it’s a global sport with the requisite elite element, and sufficient money floating about that players move between teams, leagues and countries with extreme ease. If any teams were going to be calculating about hiring A-players, you’d see them in soccer.

To be fair, there are absolutely teams who do this. The most obvious example is the galáctico policy pursued by Real Madrid. However, this turns out to be pretty rare: teams will transfer players for a variety of different reasons, and rate them in different ways, and while there are large-money transfers for world-class players, these make up a small amount of activity at even the largest clubs as a rule.

The absolute best teams in soccer are not associated with world-class A-players; they’re actually a function of the manager. My favourite example right now is Leicester City: not least as I grew up in Leicester, but (correct at the time of writing this) because they’re flying high at the top of the Premiership with one of my favourite managers, Claudio Ranieri. Their leading scorers, with 36 goals between them, are those huge names “Jamie Vardy” and “Riyad Mahrez” (you will be forgiven if you don’t follow soccer for having never heard of the pair of them).

Vardy at least was a record signing: Leicester paid £1 million for him from a non-league club (and reportedly, the deal could even be worth more). Let’s be clear about this, though: for a premier league player, that sum is an absolute pittance. Mahrez came from a league club to be sure – Ligue 2 AAS Sarcelles – for an undisclosed fee, which again will be small in the scheme of the league (and in fact, both players joined before the team were promoted to the Premiership). Both players now absolutely have a market value in tens of millions.

Leicester did not sign A-players; in fact, they signed a bunch of low-league and non-league players in order to rise out of the Championship, and then made the very sensible decision not to attempt to parachute in Premiership stars, and instead stick with the team they had built. This work is largely attributable to their previous manager, Nigel Pearson, and does not particularly break the mold in many ways – this is a well-trodden path that has seen many clubs rise through the leagues at the hands of an excellent manager (e.g. Nottingham Forest in the Clough era, before which they were largely forgettable and underachieving, or Leeds United under Revie, who started as an awful side that struggled to attract youth players let alone professionals).

If anything, in fact, the record of teams within this league that have gone out to buy the best has been pretty awful – Chelsea have done well but at an absolutely enormous cost, and many teams inflated by A-players have quickly fallen once the money ran out (Newcastle, West Ham, as examples).

The manager with the most enviable record for team-building, of course, must be Alex Ferguson. While his best Manchester United teams were full of world-class players, many of whom had been bought in at some cost, his career was partly defined by the number of A-players he allowed to leave the club. Paul Ince was a huge loss to the club, as was Hughes and Kanchelskis, and (not for the first time in his career) Ferguson came under significant pressure to resign. Instead, he opted (having failed to sign some names) to bring in members of the youth team – names who are now instantly recognisable, like the Nevilles, Beckham, Scholes. Commentator Alan Hansen characterized their opening loss with the immortal words, “You can’t win anything with kids”. He also let go Cantona, and Ronaldo (who left at the height of his powers) brought an almost £70 million profit to the club upon sale. Sure, Ferguson also spent money (and brought in players who didn’t perform – Taibi, Djemba-Djemba, Tosic, Zaha, Bebe, to name but a few – some of them real A-players, like Andersen), but Manchester United were one of the most sustainably successful clubs for a long period of time under his leadership.

All of this brings me back to my central point. It’s not the player that is important; it’s the team – and the decision about who to bring into a team, when, and how, is the responsibility of the manager. It’s very easy to say “I’m going to go and hire only A-players!”, but actually, it’s probably one of the worst things you can do for a team. In football terms, you need balance in a side – people with different strengths, abilities and points of view. Football is notorious for great players who leave one club for another, and become totally anonymous shadows of their former selves: this is not because they’re B- or C-players in disguise, but because if a team doesn’t have a A-player shaped hole, that player will not be able to perform like an A-player.

It is incumbent on the best team leaders to develop the team first and foremost; the aim is that based on the performance of the team others will look back and say, “That is a team of A-players”. It’s easy to state how to do this, but remarkably difficult in practice:

  1. set the right culture for the team, right from the off. This is a whole topic in and of itself, and the culture creates the environment for performance but importantly does not trigger performance itself
  2. where an existing team is in place, plan out how to adapt and grow the people within the team, guided by the culture
  3. for each new member being added to the team, think hard about the gap they’re filling and whether or not they are the right “shape” to fill the gap.

Think about the levers in the team that are used to create and maintain performance. Most managers will automatically turn to metrics and reports; these are amongst the least powerful tools. The crucial factors here are ensuring clarity of purpose, adequacy of tools and resources, autonomy to perform and passion for the work. Passion burns most fiercely when fuelled by success, and as a team leader that is your end goal. Hiring is important, it’s best to start in the best place possible, but I don’t believe it’s anything more than a good start.

“The quality of a person’s life is most often a direct reflection of the expectations of their peer group.” ― Tony Robbins

Tech2020 followup

The various videos of the speakers from Tech2020 – including yours truly – are up and available for Skillsmatter members. Going back to my previous blog post, I can heartily recommend the speakers who I was excited about, but have to say, I was blown away by the overall quality of the conference. Even those topics I didn’t think would hold much interest or news for me turned out to be incredibly interesting, and I daresay the next editing of this conference will be something to watch out for.

Tech2020

For a while now, I have been waxing lyrical (to those who will listen) about the variety of new tools and analyses available to people who want to prognosticate. If nothing else, the current craze for data within most businesses has resulted in people almost literally swimming around in the stuff without an awful lot of an idea about what to do with it, and while this has lead to some unspeakably shambolic practices (those who know me will likely have heard me on my hobby horse about proving models with actual experimentation) it has also opened up new horizons for people like me.

So, I’m delighted to have been invited to give a talk I submitted to the Bartech Tech2020 conference, this coming week in London – the first meeting of this particular group, there is a great line-up of speakers, all of whom are going to be reading the runes and describing their vision of the year 2020. Wonderfully, the various talks will be recorded and available, so there will be significant opportunity come the year 2020 to look back and groan loudly at the errors and omissions that will have piled up nicely by that point.

There are some brilliant speakers lined up, and I have to confess to being eager to particularly hear from this lot (in no particular order):

  • Zoe Cunningham – the old refrain, “culture eats strategy for breakfast”, has never been more true than now. It’s also one of the most difficult things to set right and predict;
  • David Wood – working in healthcare, I’m incredibly interested in David’s talk, and am certain that what we will call healthcare in another ten year’s time will in many ways bear little resemblance to what is practised now;
  • Simon Riggs – in all honesty, I’m hoping he’s going to be talking at least in part about homomorphic encryption because I just read the Gentry paper recently and it’s fascinating, but there is so much to come in this space – particularly now that data is so large and non-local that all sorts of new strategies are needed.

I’m going to attempt to tweet through most of the conference in low volume, probably on #tech2020, and look forward to putting a few more faces to names from Bartech.

What I realised I’m missing from Gnome

Not that long ago, I did a switch on my Android phone: against all the promises I made to myself beforehand, I switched on the Google account and allowed it to sync up to GCHQ/NSA the cloud. I did this for one main reason: I had just got an Android tablet, and I despised having to do the same stuff on each device, particularly since they weren’t running the same versions of Android, and one was a Nexus – so not all the UI was the same. The benefits, I have to say, were pretty much worth it: I don’t have too much sensitive data on there, but the ease of use is incredible. What was particularly good was that when I broke my phone, and had to have a new one, once the new one was linked up everything was basically back how it was. That’s tremendously powerful.

Now, I recently acquired a bit of Apple equipment and of course installed Fedora 19 on it. Just to digress briefly: installing Fedora 19 on any new Mac hardware, particularly if you want to keep Mac OS X around (I don’t much care for OS X, but keeping it for now seems handy), is tremendously difficult. I had wired ethernet (brilliant, because I was using the netinstall – which, I should note, is a truly wonderful experience in the new Anaconda) which was lucky, since the wifi doesn’t work by default. The disk partitioning is incredibly complex, and the installation documentation is not particularly good. At some point I might try and help update the documentation, but it would feel a little like the blind leading the blind at this stage: although I have Fedora booting, the Mac OS X grub entries don’t work.

Logging into my desktop, though, I realised everything was bare. This was not like the Android experience at all – everything, from my username to my dot config files, needed to be set up again. I rarely change hardware, and previously I saw this as a reason to make a fresh start of things: but actually, now I value the convenience more highly.

It’s not like things are totally bad:

  • Gnome’s account settings can pull in some limited information, from Google or OwnCloud or other similar systems
  • Apps like Firefox have excellent built-in secure synchronisation that’s not a complete pain to set up
  • you can use apps like SparkleShare to make specific directories available elsewhere.

However, this really isn’t the experience I want:

  1. I should be able to use some online “Gnome Account” in the same way I can set up Enterprise Login during install
  2. That “Gnome Account” should have all my key configuration, including the details of other accounts I have linked up (maybe not the passwords, but at least the settings)
  3. If I have online storage / backup somewhere, it should offer to sync that up
  4. I should be able to sync my entire home data, not just specific bits
  5. If the two machines are on, I should be able to access one from the other – even if there’s a firewall in the way

I realise point five above is particularly moon-on-a-stick territory.

Technically speaking, a lot of the basic bits are kind of there, one way or another. Most Gnome apps use the standard dconf settings system, and in theory it’s possible to synchronise that stuff where it makes sense (this is, of course, handwaving: whether or not you want all settings exactly the same on each machine is virtually an impossible question to answer). Discovering and syncing other data shouldn’t be that hard. Remote access to another machine is definitely much harder, but the various protocols and discovery mechanisms at least exist.

Annoyingly, there doesn’t seem to be much development in this direction – not even prototypes. There are lots of nasty problems (syncing home directories is fraught with danger), and even if you were willing to buy into a simpler system to get the goodies, making it work in Gnome is probably an awful lot easier than dealing with the other apps that aren’t Gnome aware.

I’m certainly not in much of a position to start developing any of this stuff right now, but it would be interesting to at least attempt to draw out a believable architecture.  A decent 70 or 80% solution might not even be too hard to prototype, given the tools available. It would be interesting to hear from anyone else who is working on this, has worked on it, or knows of relevant projects I’ve missed!

 

A first look at docker.io

In my previous post about virtualenv, I took a look at a way of making python environments a little bit more generic so that they could be moved around and redeployed at ease. I mentioned docker.io as a new tool that uses a general concept of “containers” to do similar things, but more broadly. I’ve dug a bit into docker, and these are my initial thoughts. Unfortunately, it seems relatively Fedora un-friendly right now.

The first thing to look at is to examine what, exactly, a “container” is. In essence, it’s just a file system: there’s pretty much nothing special about it. I was slightly surprised by this; given the claims on the website I assumed there was something slightly more clever going on, but the only “special sauce” is the use of aufs to layer one file system upon another. So from the point of view of storage alone, there really isn’t much difference between a container and a basic virtual machine.

From the point of view of the runtime, there isn’t an awful lot of difference between a virtual machine and a container either. docker sells itself as a lightweight alternative to virtual machines, but of course there is no standard definition of a “virtual machine”. At one end of the spectrum are the minimal hardware OSen that can be used to assign different host resources, including CPU cores, to virtual machines, and those types of VM are effectively not much different to real hardware – the configuration is set on the fly, but basically it’s real metal. On the other end of the spectrum you have solutions like Xen, which make little to no use of the hardware to provide virtualisation, and instead rely on the underlying OS to provide the resources that they dish out. docker is just slightly further along the spectrum than Xen: instead of using a special guest kernel, you use the host kernel. Instead of paravirtualisation ops, you use a combination of cgroups and lxc containers. Without the direct virtualisation of hardware devices, you don’t need the various special drivers to get performance, but there are also fewer security guarantees.

There are a couple of benefits of docker touted, and I’m not totally sold on all of them. One specific claim is that containers are “hardware independent”, which is only true in a quite weak way. There is no specific hardware independence in containers that I can see; except that docker.io only runs on x86_64 hardware. If your container relies on having access to NX bit, then it seems to me you’re relying on the underlying hardware having such a feature – docker doesn’t solve that problem.

The default container file system is set up to be copy-on-write, which makes it relatively cheap diskspace-wise. Once you have a base operating system file system, the different containers running on top of it are probably going to be pretty thin layers. This is where the general Fedora un-friendliness starts, though: in order to achieve this “layering” of file systems, docker uses aufs (“Another Union File System”), and right now this is not a part of the standard kernel. It looks unlikely to get into the kernel either, as it hooks into the VFS layer in some unseemly ways, but it’s possible some other file system with similar functionality could be used in the future. Requiring a patched kernel is a pretty big turn-off for me, though.

I’m also really unsure about the whole idea of stacking file systems. Effectively, this is creating a new class of dependency between containers, ones which the tools seem relatively powerless to sort out. Using a base Ubuntu image and then stacking a few different classes of daemon over it seems reasonable; having more than three layers begins to seem unreasonable. I had assumed that docker would “flatten out” images using some hardlinking magic or something, but that doesn’t appear to be the case. So if you update that underlying container, you potentially break the containers that use it as a base – it does seem to be possible to refer to images by a specific ID, but the dockerfile FROM directive doesn’t appear to be able to take those.

The net result of using dockerfiles appears to be to take various pieces of system configuration out of the realm of SCM and into the build system. As a result, it’s a slightly odd half-way house between a Kickstart file and (say) a puppet manifest: it’s effectively being used to build an OS image like a Kickstart, but it’s got these hierarchical properties that stratify functionality into separate filesystem layers that look an awful lot like packages. Fundamentally, if all your container does it take a base and install a package, the filesystem is literally going to be that package, unpacked, and in a different file format.

The thing that particularly worries me about this stacking is memory usage – particularly since docker is supposed to be a lightweight alternative. I will preface this with the very plain words that I haven’t spent the time to measure this and am talking entirely theoretically. It would be nice to see some specific numbers, and if I get the time in the next week I will have a go at creating them.

Most operating systems spend a fair amount of time trying to be quite aggressive about memory usage, and one of the nice things about dynamic shared libraries is that they get loaded into process executable memory as a read-only mapping: that is, each shared library will only be loaded once and the contents shared across processes that use it.

There is a fundamental difference between using a slice of an existing file system – e.g., setting up a read-only bind mount – and using a new file system, like an aufs. My understanding of the latter approach is that it’s effectively generating new inodes, which would mean that libraries that are loaded through such a file system would not benefit from that memory mapping process.

My expectation, then, is that running a variety of different containers is going to be more memory intensive than a standard system. If the base containers are relatively light, then the amount of copying will be somewhat limited – the usual libraries like libc and friends – but noticeable. If the base container is quite fat, but has many minor variations, then I expect the memory usage to be much heavier than the equivalent.

This is a similar problem to the “real” virtual machine world, and there are solutions. For virtual machines, the same-page mapping subsystem (KSM) does an admirable job in figuring out which sections of a VM’s memory are shared between instances, and evicting copies from RAM. At a cost of doing more compute work, it does a better job that the dynamic loader: shared copies of data can be shared too, not just binaries.  This can make virtual machines very cheap to run (although, if suddenly the memory stops being shareable, memory requirements can blow up very quickly indeed!). I’m not sure this same machinery is applicable to docker containers, though, since KSM relies on advisory flagging of pages by applications – and there is no application in the docker system which owns all those pages in the same way (for example) qemu would do.

So, enough with the critical analysis. For all that, I’m still quite interested in the container approach that docker is taking. I think some of the choices – especially the idea about layering – are poor, and it would be really nice to see them implement systemd’s idea of containers (or at least, some of those ideas – a lot of them should be quite uncontroversial). For now, though, I think I will keep watching rather than doing much active: systemd’s approach is a better fit for me, I like the additional features like container socket activation, and I like that I don’t need a patched kernel to run it. It would be amazing to merge the two systems, or at least make them subset-compatible, and I might look into tools for doing that. Layering file systems, for example, is only really of interest if you care a lot about disk space, and disk space is pretty cheap. Converting layered containers into systemd’able containers should be straightforward, and potentially interesting.

packaging a virtualenv: really not relocatable

My irregular readers will notice I haven’t blogged in ages. For the most part, I’ve been putting that effort into writing a book – more about this next week – hopefully back to normal service now though.

Recently I’ve been trying to bring an app running on a somewhat-old Python stack slightly more up-to-date. When this app was developed, the state of the art in terms of best practice was to use operating system packaging – RPM, in this case – as the means by which the application and its various attendant libraries would be deployed. This is a relatively rare mode of deployment even though it works fantastically well, because many developers are not happy maintaining the packaging-level skills required to maintain the system. From what I read the Mozilla systems administrators deploy their applications using this system.

For various reasons, I needed to bring up an updated stack pretty quickly, and spending the time updating the various package specifications wasn’t really an option. It didn’t need to be production rock-solid, but it needed to be deployable on our current infrastructure. The approach that I took was to build a packaged virtualenv Python environment: I’ve read online about other people who had tried it to relative success, although there are not many particularly explicit guides. So, I thought I would share my experiences.

The TL;DR version of this is that it was actually a relatively successful experiment: relying on pip to grab the various dependencies of the application meant that I could reliably build a strongly-versioned environment, and packaging the entire environment as a single unit reduced the amount of devops noodling. There is a significant downside: it’s a pretty severe mis-use of virtualenv, and it requires some relatively decent understanding of the operating system to get past the various issues.

Developing the package

As I have a Fedora background, I’m not really happy slapping together packages in hacky ways. One of the things I’m definitely not happy doing is building stuff as root: it hides errors, and there’s pretty much no good reason to do anything as root these days.

In order to build a virtualenv you have to specify the directory in which it gets built, and without additional hacks that’s not going to be the directory to which it installs. So, the “no root build” thing immediately implies making the virtualenv relocatable.

The web page for virtualenv currently has this sage warning:

“The --relocatable option currently has a number of issues, and is not guaranteed to work in all circumstances. It is possible that the option will be deprecated in a future version of virtualenv.”

Wise words indeed. There are a tonne of problems moving a virtualenv. Encoding the file paths directly into files is an obvious problem, and virtualenv makes a valiant attempt at fixing up things like executable shebangs. It doesn’t catch everything, so some stuff has to be rewritten manually (by which I mean, as part of the RPM build process – obviously not doing it by hand).

Worse still, it actively mangles files. Consider one of pillow’s binaries, whose opening lines become:

#!/usr/bin/env python2.7

import os; activate_this=os.path.join(os.path.dirname(⏎
os.path.realpath(__file__)), 'activate_this.py');⏎
execfile(activate_this, dict(__file__=activate_this));⏎
del os, activate_this

from __future__ import print_function

Unfortunately this is just syntactically invalid python – future imports have to come first. Again, it’s fixable, but it’s manual clean-up work post-facto.

What to do about native libraries

Attempting to use python libraries having native portions, be it bindings or otherwise, is also an interesting problem. To begin with, you have to assume a couple of things: that native code will end up in the package, and not all of it will be cleanly built. The obvious example of both those rules is that the system binary python is copied in.

This causes problems all over the shop. RPM will complain, for example, that the checksum of the binaries don’t match what it was expecting: this is because it reads the checksum from the binary directly rather than calculate it at package time, and prelink actually alters the binary contents (this happens after the RPM content is installed, but RPM ignores those changes for the purposes of its package verification).

Another example of native content not playing well with being packaged is that binaries will quite often have an rpath encoded into them. This is used when installing into non-standard locations, so that libraries can be easily found without having to add each custom location into the link loader search path. However, RPM rightly objects to them. It’s possible to override RPM’s checks, but that’s pretty naive. Keeping rpaths means bizarre bugs turn up when the paths actually exist (e.g., installing the environment package on the development machine building the package – which is quite plausible, given the environment package may end up being a build-time dependency of another).

Thankfully, binaries can usually be adjusted after the fact for both these things; it’s possible to remove the rpaths encoded into a binary, and undo the changes prelink.

In the end, I actually made a slightly hacky choice here too: I decided that the virtualenv would allow system packages. This was the old default, but is no longer because it stops the built environments being essentially self-contained. This allowed me to build certain parts of the python stack as regular RPMs (for example, the MySQL connector library) and have that be available within the virtualenv. This is only possible if there is going to be one version of python available on the system (unless you build a separate stack on a separate path – always possible), and takes away many of the binary nasties, since the binary compilation process is then under the control of RPM (which tends to set different compiler flags and other things).

The obvious downside to doing that is that system packages are already fulfilled when you come to build the virtualenv, meaning that the virtualenv would not be complete. If that’s the intention that’s ok, but that’s not always what’s wanted. I resorted to another hack: building the virtualenv without system packages, and then removing the no-global-site-packages flag manually. This means you have to feed pip a subset of the real requirements list, leaving out those things that would be installed globally, but that seemed to work out reasonably well for me.

The rough scripts that I used, then, were these. First, the spec file for the environment itself:

%define        ENVNAME  whatever
Source:        $RPM_SOURCE_DIR/pyenv-%{ENVNAME}.tgz
BuildRoot:     %{_tmppath}/%{buildprefix}-buildroot
Provides:      %{name}
Requires:      /usr/bin/python2.7
BuildRequires: chrpath prelink

%description
A packaged virtualenv.

%prep
%setup -q -n %{name}

%build
rm -rf $RPM_BUILD_ROOT
mkdir -p $RPM_BUILD_ROOT%{prefix}
mv $RPM_BUILD_DIR/%{name}/* $RPM_BUILD_ROOT%{prefix}

# remove some things
rm -f $RPM_BUILD_ROOT/%{prefix}/*.spec

%install
# undo prelinking
find $RPM_BUILD_ROOT/opt/pyenv/%{ENVNAME}/bin/ -type f -perm /u+x,g+x -exec /usr/sbin/prelink -u {} \;
# remove rpath from build
chrpath -d $RPM_BUILD_ROOT/opt/pyenv/%{ENVNAME}/bin/uwsgi
# re-point the lib64 symlink - not needed on newer virtualenv
rm $RPM_BUILD_ROOT/opt/pyenv/%{ENVNAME}/lib64
ln -sf /opt/pyenv/%{ENVNAME}/lib $RPM_BUILD_ROOT/opt/pyenv/%{ENVNAME}/lib64

%clean
rm -rf $RPM_BUILD_ROOT

%files
%defattr(-,root,root)
%{prefix}opt/pyenv/%{ENVNAME}

(Standard files like name and version are missing – using the default spec skeleton fills in the missing bits). It’s not totally obvious from this, but I actually ended up building the virtualenv first and using that effectively as the source package:

virtualenv --distribute $(VENV_PATH)
. $(VENV_PATH)/bin/activate && pip install -r requirements/production.txt
virtualenv --relocatable $(VENV_PATH)
find $(VENV_PATH) -name \*py[co] -exec rm {} \;
find $(VENV_PATH) -name no-global-site-packages.txt -exec rm {} \;
sed -i "s|`readlink -f $(VENV_ROOT)`||g" $(VENV_PATH)/bin/*
cp ./conf/pyenv-$(VENV_NAME).spec $(VENV_ROOT)
tar -C ./build/ -cz pyenv-$(VENV_NAME) > $(VENV_ROOT).tgz
rm -rf $(VENV_ROOT)

 Improving on this idea

There’s a lot to like about this kind of system. I’ve ended up at a point where I have a somewhat bare-bones system python packaged, with a few extras, and then some almost-complete virtualenv environments alongside to provide the bulk of dependencies. The various system and web applications are packaged depending on both the environment and the run-time. The environments tend not to change particularly quickly, so although they’re large RPMs they’re rebuilt infrequently. I consider it a better solution than, say, using a chef/puppet or other scripted system to create an environment on production servers, largely because it means all the development tools stay on the build systems, and you can rely on the package system to ensure the thing has been properly deployed.

However, it’s still a long, long way from being perfect. There are a few too many hacks in the process for me to be really happy with it, although most of those are largely unavoidable one way or another.

I also don’t like building the environment as a tarball first. An improvement would be to move pretty much everything into the RPM specfile, and literally just have the application to be deployed (or, more specifically, its requirements list) as the source code. I investigated this briefly and to be honest, the RPM environment doesn’t play wonderfully with the stuff virtualenv does, but again these are probably all surmountable problems. It would then impose the more standard CFLAGS et al from the RPM environment, but I don’t know that it would end up removing too many of the other hacks.

The future

I’m not going to make any claims about this being a “one true way” or some such – it clearly isn’t, and for me, the native RPM approach is still measurably better. Yes, it is slightly more maintenance, but for the most part that’s just the cost of doing things right.

What is interesting is that this kind of approach seems to be the way a number of other systems are going. virtualenv has been so successful that it’s now effectively a standard piece of python, and rightly so – it’s an incredible tool. Notably, pyenv (the new tool) does not have the relocatable option available.

I’m slightly excited about the docker.io “container engine” system as well. I haven’t actually tried this yet, so won’t speak about it in too concrete terms, but my understanding is that a container is basically a filesystem that can be overlaid onto a system image in a jailed environment (BSD readers should note I’m using “jail” in the general sense of the word – sorry!). It should be noted that systemd has very similar capability in nspawn too, albeit less specialist. Building a container as opposed to an RPM is slightly less satisfying: being able to quickly rebuild small select portions of a system is great for agile development, and having to spin large chunks of data to deploy into development is less ideal, but it may well be the benefits outweigh the costs.

A (fond) farewell to Zend Framework

I’ve been a Zend Framework user for a while. I’ve been using PHP long enough to appreciate the benefits of a good framework, and developed a number of sophisticated applications using ZF, to have grown a certain fondness for it. Although it has a reputation for being difficult to get into, being slow and being overly complicated – not undeserved accusations, if we’re being honest – there is something quite appealing about it. Well, was, for me at least. ZF 1.11 looks like the last version of the framework I will be using.

Why? The simple answer is ZF 2.0. Having been busily built over the past couple of years one way or another, a number of betas have been released and it looks likely to me that an initial release is a few months away. At this point I need to make a decision about my future use of the framework, and I don’t particularly like what I see.

Let’s be quite honest about one thing up-front: I cannot claim to have done any substantial amount of work in ZF 2.0. The criticisms within are all personal opinion based on little more than the most itinerant tinkering.

That said, I actually don’t feel like much of what I’m about to say is unfair, for one simple reason: I have tried to like ZF 2.0. There are of course other PHP frameworks, and I don’t really need to name them, and many of them are initially much nicer to get started on than ZF. Despite all that, I got quite happy with ZF1, and indeed approached ZF2 with the idea that it would take a similar amount of effort to learn to like it. I have attempted to apply that effort. I have failed.

Much of what I think is wrong with ZF2 you can quite obviously see in the ZendSkeleton example application. Now, of course, the example applications for most things are pretty poor: every JS framework has a To-do app, and things are generally chosen to show off the best features of the framework in their most flattering light. That’s actually the first thing that hits me about the skeleton application: it’s deathly, deathly dull, but there’s a pile of code needed to get you to that point. The sheer amount of boilerplate needed to get much further than ‘Hello World’ is incredible, and of truly Java-like proportions.

Generally, I like my frameworks opinionated. I like having a culture, or an ethos, which come through as a set of guiding principles for the developers of both applications using the framework, and the framework itself. And ZF certainly is and was opinionated. I suppose at this point, I find that my opinions differ with theirs too much, and that’s an issue.

The first opinion I would cite is the use of dependency injection. Now, I get DI, truly I do. I even like the idea of it. I can see how it would be useful, and how it could add a heap of value to a project. But there is “useful feature” and then there is “koolaid”, and DI in ZF2 is alarmingly close to the latter. As a case in point, just take a peek at the module config for the skeleton app.

The comment at the top of the file first sent shivers down my spine – “Injecting the plugin broker for controller plugins into the action controller for use by all controllers that extend it” – it’s the kind of enterprise buzzword bingo that again, sorry people, sounds just like a Java app.

And as you progress through what is supposed to be just a bit of config for a simple application, wading past the router config and various pieces of view configuration, you’ll see the thing which just turned me right off – ‘Zend\View\Helper\Doctype’. Seriously? A fragment of view code to manage the doctype? As if this is something you can just change at runtime? “Oh yeah, we built it in HTML5, but I can just downgrade to HTML4 by switch….” sorry, no. Doctype is totally fundamental to the templates you’ve written. This is so far from application config it’s not funny.

Other stuff I can’t abide: the default file tree. Did you think application/controllers/IndexController.php was bad enough? Now you have module/Application/src/Application/Controller/IndexController.php. I do get the reason for this, but again enforced modularisation – ZF1 supported modules too without forcing it.

I know how observers might respond to this: it’s a skeleton app. It’s supposed to be showing a set of best practice, you can cut corners and make things simpler. Except, this isn’t true: there’s already a whole load of corners cut. Just look in the layout; there’s a pile of code at the top of the template. Isn’t the view supposed to be where the code lives?! I would have most of that crap in my Bootstrap.php as-was, I can’t believe people are advocating throwing that in the layout template now (and I’m sure they’re not). But there it is, cluttering up the layout, when it really should be refactored somewhere else.

This is the issue. The skeleton app does a whole heap of things just to do nothing except chuck Twitter bootstrap on screen. I am, of course, willing to be shown how all this up-front investment will pay off in the end – but right now, I really do not see it. The more I look, the more I see things which will just require more and more code to get working – a constant investment throughout the life of a project, without any obvious pay-back for it later. As a rule of thumb, whenever I’ve used a framework before, the skeleton always looks pretty good, but a production app gets entirely more complex and hairy. Things don’t improve, generally, at best they stay as-bad. I would worry that a ZF2 app would just explode into a sea of classes entirely unnavigable by a junior programmer, held together by a DI system so abstract they would have little chance of properly comprehending it.

This is really sad. ZF1 had a number of shortcomings which I thought ZF2 looked on track to tackle – and, in all probability, has tackled. The REST controllers in ZF1 were complete bobbins, and ZF2 looks like it has that right. The Db layer in ZF1 was actually quite good, but ZF2 looks to have improved on it. PHP namespaces are of course ugly as sin and ZF2 embraces them, but they make sense and I could potentially learn to love them. But my gosh, just look at the quickstart. Remember, this is the “get up and running as fast as possible” guide for people who already know the language and just want to get cracking.

What is bad about it? Well, 12.2.2 is the start of the “you’ve already installed it – let’s get coding” section. First item on the todo list? “Create your module”. This involves downloaded the skeleton, copying bits over, and being told all is well. 12.2.3, update the class map for the autoloader, using the correct namespace, ensuring configuration is enabled and being lenient with the autoloading (let’s both you and me pretend we understood what on earth this section was trying to achieve).

12.2.4, create a controller. Oh my god, I don’t want to know what Zend\Stdlib\Dispatchable is there for, or why I might pick a REST controller because the quick start doesn’t cover REST. But no fear, we have a basic controller, it looks like this:

namespace\Controller;

use Zend\Mvc\Controller\ActionController,
    Zend\View\Model\ViewModel;

class HelloController extends ActionController
 {
   public function worldAction()
     {
        $message = $this->getRequest()->query()->get('message', 'foo');
        return new ViewModel(array('message' => $message));
     }
  }

Unfortunately this reminds me again – I hate to use the J-word – of all the geek Java jokes. Boilerplate object-this and method-other-thing-another-method-that().

I so want to be interested in ZF2, but it’s about as far up the corporate enterprise architecture-astronaut ladder as I have ever seen PHP climb. And honestly, if I wanted to program Java, I’d use Java. And then I’d download Play or Scala and actually enjoy it. But for PHP, no. So, adieu, ZF. It has been nice knowing you.

“Dart” out in the open – what’s it all about?

This morning was the big “Dart language” unveil – the Dart websites are up at http://dartlang.org and http://dart.googlecode.com. And already many seasoned Javascripters have the knives out. I’m surprised for a couple of reasons: the first, this isn’t quite as big a deal as many people thought it would be (me included), both in terms of the scope of the system and the distance to Javascript. Second, the system isn’t quite as finished as many predicted: this isn’t going to be usable for a little while.

That all aside, let’s look at the highlights:

It’s classicly classful

Javascript has a prototypical object system. That means that instead of having classes that define what objects should look like, you simply create an object, make it look how you want it to look, and then use that as a template for all future members of that “class”. This is deeply confusing to many people who have come from languages like C#, Java, C++, Python (etc. – i.e., practically all of them) where you do things the classical way. The Dart designers have more or less thrown out the prototypical system, and a standard class-based system is available, with some degree of optional typing.

And in fact, it seems more or less mandatory: they’ve used main() as the code entry point again, and like in Java, for anything non-trivial you’re basically going to have to write a class which implements that function.

I’m not yet sure whether this is a great thing or not – mostly, I’m ambivalent – but lots and lots of people have written their own “quacks like a class” system on top of Javascript, including Doug Crockford. Javascript is flexible enough to represent this, so the Dart-to-Javascript compilation bits will work, but obviously it’s not going to interact well with Javascript libraries that take a different view of classes or inheritance. This is probably not a problem; Perl has a number of different ways of implementing objects and there doesn’t generally seem to be much trouble with it.

Wider standard library set

Javascript has been let down by its standard library set in many ways. First, there really aren’t many data types available: you have an associative array, you have a non-associative array, and that’s about it. Objects are more or less associative arrays. But also, there aren’t other APIs to do useful things in the primary heartland of Javascript, the browser. The answer to all this, of course, has been the rather well designed Javascript library projects that have sprung into being: the JQuery, Mootools and YUIs of this world. And there are many of them, and the competition is fierce, and the end results are actually very good.

Dart goes a slightly different way with this. The library sets that come with Dart do a lot more than Javascript is capable of. There are lots more basic types, and many basic behaviours (interfaces) that describe in what context you can use such data – for example, any type that implements ‘Iterable’ can be used in a loop. It’s pretty great that this is all standard. Sadly, the DOM library makes a re-appearance, which is a bit of a shame because it’s literally one of the suckiest APIs ever invented, but on the flip side it does mean that the likes of JQuery could be ported to Dart easily.

Sugar for asynchronicity

Javascript, particularly when used in the browser, is deeply asynchronous. Unfortunately, the language itself doesn’t have an awful lot of support for that. You can pass functions around as first-class objects, so a lot of APIs are designed with function call-backs to execute “later on”. This leads to a kind of “macaroni code” where roughly procedural code (“Do this, then do that, then do this other thing”) is broken up over many functions just so it can be passed around like this. Dart gives the programmer a little bit of help here by implementing Promises.

In Dart, the Promise is an interface which looks an awful lot like a thread in many ways. The key sugar here is that the language still has the callbacks, but you can chain them together with then() instead of embedding them each within itself. You can also check on how they’re doing, cancel them if you like, and other stuff – again, nothing that Javascript doesn’t have, but slightly more elegant. Combined with the class system, it also means the end of ‘var $this = this’ and other such scoping hacks.

Message passing

This is probably more important than the Promises interface. Dart has message passing built-in, like many suspected. And, it looks nice and simple: you have ports, and you can either receive messages from them or send messages to them. The receivers are basically event-driven in the same way a click handler would be. Seeing the value here is difficult in some ways: it will be interesting to see how the balance is struck, because if you are designing a class you could either make an API which creates Promises, or send/receive messages – the net effect is roughly the same. You probably don’t want to implement both, but which system you use is up to you. The message passing interface is slightly more decoupled; but it’s probably easier to abuse in the longer term.

It’s all sugar

I think this is the thing which surprises me most about Dart: it’s actually pretty close to Coffeescript, but with a more Javscript-like syntax. And that’s why I can see this being successful: you can turn it into standard Javascript, but it gives us a lot of the bells and whistles that programmers have been crying out for. Classes have getters and setters like C#, strings can have variables that interpolate within them, you can write really lightweight functions in a new => syntax, and classes can have built-in factories – just to name a few of the highlights.

There are some extras, like the ability to reference needed CSS, which point to a slightly grander future where Dart scripts are rolled up with their related resources into something more easily distributable. And maybe this is the point: the unveiling of Dart was not really a beginning itself, but the beginning of a beginning. They’ve designed the language to attempt to grow with your application: you can start small and simple, but as your application grows you can add more to it (like types and interfaces) to make it more structured and, hopefully, safer to run. And in the same sense, Dart itself is going to grow over time as well.

 

Is package management failing Fedora users?

(For those looking for an rpm rant, sorry, this isn’t it….!)

Currently there’s a ticket in front of FESCo asking whether or not alternative dependency solvers should be allowed in Fedora’s default install. For those who don’t know, the dependency solver is the algorithm which picks the set of packages to install/remove when a user requests something. So, for example, if the user asks for Firefox to be installed, the “depsolver” is the thing which figures out which other packages Firefox needs in order to work. On occasion, there is more than one possible solution – an obvious example often being language packs; applications usually need at least one language installed, but they don’t care which.

I don’t particularly have much skin in this particular game; but what I would note is that I find it particularly bizarre that this task is delegated to an algorithm. What we’re saying, basically, is that the configuration of a given installation is chosen by a bit of software. So long as the various package requirements – which could be library versions, files, or something entirely synthetic – are all met, the configuration is “valid”. Of course, that doesn’t necessarily mean it works – it may be totally untested by anyone else, and things get particularly grizzly if you’re doing something “fun”. Such fun includes:

  • deploying “multi-arch” packages. Maybe you want a 32-bit browser plugin on your 64-bit PC, for example;
  • installing third-party packages. Maybe it’s RPM Fusion, maybe it’s an ISV – but where-ever it’s from, it’s another set of variables the depsolvers looks at;
  • installing your own packages. See above.

The package management system doesn’t have a concept of “OS” and “other stuff”. Being able to override such a concept would be a feature; not having it is not a feature however.

Now, fans of package management frequently tout the many benefits, and they are indeed multiple. It makes it easy to install new software and be reasonably sure it works (it may need a lot of configuration, though). Splitting stuff up into a thousand different bits makes security updates more straightforward (at least, in theory – see later). But in general, to conflate all these issues is a bit of a mistake: there are other forms of installation system which provide these benefits as well.

So, what’s wrong with this? We’ve already seen that the choice of depsolver can potentially make or break the system, or at least lead to configurations which were not intended by the packagers, but to some extent that could be solved by tightening the specification/package dependencies, so that the “right choice” is obvious and algorithm-independent. But, there are other issues.

It’s difficult to estimate the number of Fedora users, but the statistics wiki page makes a reasonable effort. And looking at that, we can see that about 28 million installs of almost 34 million (that are connecting for software updates) are currently using unsupported releases of Fedora. That’s over 80% of installs using a release which is no longer supported.

This of course has security implications, because these users are no longer getting security updates. No matter how fancy the package management, these people are all on their own. And unfortunately, the package management tools are not much use here: effectively, unless you use the installer in one of its guises, the procedure is difficult and potentially error prone.

You’re also out of luck with third-party repos: the package manager doesn’t insulate them from each other, so mixing is frowned upon. It may work, it may not. You may be able to upgrade, you may not. It may alter core functionality like your video driver, and you might be able to downgrade if it failed, but let’s hope it didn’t manually fiddle with things.

In the meantime, we’re also failing to deal adequately with many types of software. The Firefox update process causes enough problems with the current setup; Google’s Chromium on the other hand appears to be almost entirely impervious to being packaged in a Fedora-acceptable way. Web applications also don’t work well; Javascript libraries don’t fit well/at all into the concept of libraries a la rpm, so there’s loads of duplication.

There’s probably an awful lot more that can be written on this topic, and of course package management right now, for the most part, works pretty well. But I worry that it’s a concept which has pretty much had its day.

Speculation on Google’s “Dart”

Just yesterday people jumped on the biographies and abstract for a talk at goto: the Keynote is Google’s first public information on Dart, a “structured programming language for the world-wide web”. Beyond knowing a couple of the engineers involved – which allows a certain amount of inference to take place – there’s also some speculation that Dart is what this “Future of Javascript” email referred to as “Dash” (this seems entirely possible: a dash language already exists; Google already used ‘Dart’ for an advertising product but have since stopped using that name, potentially to make way for the language).

I thought it would be interesting to have a look at some of the details of this new language. One thing seems quite certain: Google’s Javascript engine, V8, is going to target this, because it’s going to target client-side application programming to begin with. V8 is, of course, very popular – it’s in Chrome, it’s in Node.js, it’s going to be put in Qt. However, it hasn’t really been a brilliantly standalone project (witness the problems getting Chromium into Fedora, as an example) and the addition of Dart will almost certainly make this worse.

So, what else do we know?

Compiles to Javascript

It seems likely that the language will, at least in a proper subset, compile into Javascript – a lot like Coffeescript does. Personally, I cannot stand Coffeescript for the same reasons I really don’t like python, but there is some obvious win to this approach: you get backwards compatibility with existing systems and, usually, a method of interacting with existing code and libraries.

I suppose the first question is, how different to Javascript will it be? It will almost certainly be object-oriented, but that need not imply prototypical inheritance – it could be the Javascript compiler will do some fancy trick with object to make things appear more classical. Coffee does this to a large extent too, and I think we’ll see a similar approach. I doubt much of Coffee’s syntax would be copied – it’s almost Perl-like in its terseness sometimes – but I think there will be a similar approach to the object model.

There will be other differences. Javascript is relatively typeless, I suspect Dart will have types of some sort at least optionally. The scoping rules will probably be a bit different as well – the “let” keyword has never caught on wildly, but some level of block scoping (as an example) would be an obvious improvement.

Not just a language

I think it’s relatively clear from the “Dash” discussion that this isn’t just going to be a language: templating and possibly even MVC will be available alongside, some how. I expect to see some interesting things here, actually – there might not be much impact on the language (although a way of embedding HTML templates might be handled specially) but I think it will be closely aligned to these key tools. The Javascript world has been doing some interesting stuff – see Backbone.js and Knockout.js as two obvious examples – but it will be really interesting to see how much “platform” is put into Dart.

There is a worry here, of course, that it’s too restrictive. Knockout is actually a great example: it’s MVVM architecture, not MVC, and for a lot of jobs I’ve actually been really, really impressed with it. It’s simple, straightforward, but most of all productive. It would be a shame if you can’t do something similar in Dart, but I would bet you can. Binding data onto a web page is so fundamental, so basic, that I really think there will be some interesting stuff there.

Binary Dart?

I’m not really sure about this, but I’ll chuck it out there anyway: clearly, writing Dart in a text editor is going to be fine. However, I also expect that there would be alternative delivery mechanisms. Right now, people use tools like Closure to “compile” Javascript into a more compact representation. Clearly, if you’re starting with a new language, you could specify a binary format right from the start. This would also sit beside NaCl/Pepper quite nicely, and allow multiple resources to be included into a project without having to have multiple file downloads into the browser.

Google are going to be focussed on deployment of large, highly-interactive apps, I think – so although the small page enhancement stuff would still be on the table, really I think Dart is going to be about writing GMail and Google Docs. In that context, being able to wrap up everything into a nice deployment package makes a whole heap of sense.

A month to wait?

Sadly, I don’t think we’re going to know too much more before goto;. I had a look around the V8 source, there aren’t really many clues in there to what’s coming. If they’re offering a compile-to-Javascript option, that might be the only thing released at first – so Dart would effectively be a standalone compiler only to begin with, growing features to target specific engines later on.