Brexit confirms: storytelling is dead

This is not a post about Brexit; this is about conversations. Storytelling rose in the 80’s as a key marketing tool – phenomena like the Nescafe “Gold Blend” adverts demonstrated how the ability to tell a story could convincingly engage consumers en masse. Truth be told, this was nothing new – the “soap opera” is so-called because those ongoing serial dramas used to be sponsored by soap manufacturers. But, the key insight by the storytellers was that creating a story around a message you wanted to communicate (rather than simply being associated to or referenced by the story) was very powerful.

Now, Nescafe coffee had only a tangential bit-part within their famed serial adverts, and indeed broadcasting on television is a remarkably expensive way of telling a story – so in fact, the technique didn’t really start to take off until the early 2000s, with the advent of the internet. Of course, big names continued to tell stories in the way they had – Guiness being the more modern exemplar – but now smaller organisations could do it; they felt it built relationships with their consumers.

There is a lot to be said about discerning what is storytelling and what isn’t. Critically, a story ought to have an arc – a beginning, middle and end at least – but at a deeper level ought to have a structure which creates emotional engagement. Shakespeare was a master of the five-act structure, and most blockbuster movies to this day retain a very similar make-up. Advertisements alone do not lend themselves to that level of sophistication, but people started applying storytelling in many different areas of business – although seen as a marketing tool, it quickly leaked into sales, the boardroom, investment decks and beyond.

Many people get benefit from story-thinking without necessarily having a huge amount of structure. The process of thinking editorially about their message, and trying to frame that in the form of a story is difficult and restricting. In a similar way to writing a Tweet, the added restrictions make you think carefully about what you want to say, and it turns out these restrictions actually help rather than hinder – a message has to be much more focussed. However, those restrictions (while helpful) are not the power of storytelling – more the power of subediting / thinking (which, is seems, it less common than you’d think).

People have said before me that storytelling is dying – Berkowitz’s piece on becoming storymakers rather than tellers is well-cited. It’s a very marketing-oriented perspective, and there’s lots to agree with, but I think it’s dead wrong for digital-native organisations.

Politics is an awful lot like marketing and product development in some key ways; in many ways, it actually resembles the market before software-as-a-service:

  • highly transactional nature (votes instead of money)
  • very seasonal sales periods, often years between sales
  • competitive marketplace for a commodity product
  • repeat customers very valuable, but profit function dependent on making new sales on the current product line

Not just that, but crucial is the engagement of the “customer” (the voter) in an ongoing fashion, to ensure that the party is developing policies that they believe will be voted for. Interestingly, in blind tests, the Liberal Democrat and Green policies rate very highly – so we can see that while the product is important, market positioning is critical to ensure customers have a specific formed belief about your product.

Within continuous delivery thinking, the digital organisation is concerned primarily with conversations to drive the brand rather than positional or story-oriented marketing. However, what was particularly interesting with the Brexit debate: this conversational engagement was writ large across the whole leave campaign.

Things we can note about the campaign:

  • meaningful engagement on social platforms like Facebook and Twitter. Of course, campaigns have done this before (Corbyn would be another example), but while others have been successful at deploying their message, Leave were highly successful in modifying their conversations quickly
  • a stunningly short period of campaigning. Who knows why this happened: the Scottish referendum on independence was over a period of 18 months. The UK Brexit debate was complete in 4. There was no way a campaign could hammer home messages; each thing they said had to be well-chosen and timely
  • absolute control over the conversation. While Leave conversed freely with their own supporters, they meaningfully achieve air superiority in terms of the conversation in the debate. Their messages were the ones discussed; they created the national conversation. People are shocked by how “untrue” many of their statements were: but people can recall them readily. I doubt many could recall anything Remain said other than vague threats about the economy.

The speed of the conversation here was crucial. They adapted in a truly agile fashion, and were able to execute their OODA loop significantly more quickly. In the end, it was a tight contest, but it really should not have been.

Storytelling is a blunt instrument in comparison. It’s unresponsive, it’s broadcast, and it’s not digital native. Its time is up.

Containing incestuousness

Having droned on a little the other day about duplication in Stackanetes (in hindsight, I had intended to make a “it’s turtles all the way down” type jibe), I’ve been delighted to read lots of other people spouting the same opinion – nothing quite so gratifying as confirmation bias.

Massimo has it absolutely right when he describes container scheduling as an incestuous orgy (actually, he didn’t, I just did, but I think that was roughly his point). What is most specifically obvious is the fact that while there is a lot of duplication, there isn’t much agreement about the hierarchy of abstraction: a number of projects have started laying claim to be the lowest level above containers.

It comes back to this; deploying PaaS (such as Cloudfoundry, which I try to hard to like but seems to end up disappointing) is still way too hard. Even deploying IaaS is too hard – the OpenStack distros are still a complete mess. But while the higher level abstractions are fighting it out for attention, the people writing tools at a lower level are busy making little incremental improvements and trying to subsume new functionality – witness Docker Swarm – they’re spreading out horizontally instead of doing one thing well and creating a platform.

I don’t think it’s going to take five years to sort out, but I also don’t think the winner is playing the game yet. Someone is going to come along and make this stuff simple, and they’re going to spread like wildfire when they do it.

Stackanetes

There’s a great demo from the recent OpenStack Summit (wish I had been there):

OpenStack is a known massive pain to get up and running, and having it in a reasonable set of containers that might be used to deploy it by default is really interesting to see. This is available in Quay as Stackanetes, which is a pretty awful name (as is Stackenetes, and Stackernetes, both of which were googlewhacks earlier today) for some great work.

I’m entirely convinced that I would never actually run anything like this in production for most conceivable workloads because there’s so much duplication going on, but for those people with a specific need to make AWS-like capability available as a service within their organisation (who are you people?!) this makes a lot of sense.

I can’t help but feel there is a large amount of simplification in this space coming, though. While “Google for everyone else” is an interesting challenge, the truth is that everyone else is nothing like Google, and most challenges people face are actually relatively mundane:

  • how do I persist storage across physical hosts in a way that is relatively ACID?
  • how do I start resilient applications and distribute their configuration appropriately?
  • how do I implement my various ongoing administrative processes?

This is why I’m a big fan of projects like Deis for businesses operating up to quite substantial levels of scale: enforcing some very specific patterns on the application, as far as possible, is vastly preferable to maintaining a platform that has to encompass large amounts of functionality to support its applications. Every additional service and configuration is another thing to go wrong, and while things can made pretty bullet-proof, overall you have to expect the failure rate to increase (this is just a mathematical truth).

CoreOS in many ways is such a simplification: universal adoption of cloud-config, opinion about systemd and etcd, for example. And while we’re not going to go all the way back to Mosix-like visions of cluster computing, it seems clear that many existing OS-level services are actually going to become cluster-level services by default – logging being a really obvious one – and that even at scale, OpenStack-type solutions are much more complicated than you actually want.

 

 

Some notes on Serverless design: “macro-function oriented architecture”

Over the past couple of days I’ve been engaged in a Twitter discussion about serverless. The trigger for this was Paul Johnston‘s rather excellent series of posts on his experiences with serverless, wrapped up in this decent overview.

First, what is serverless? You can go over and read Paul’s explanation; my take is that there isn’t really a great definition for this yet. Amazon’s Lambda is the canonical implementation, and as the name kind of gives away, it’s very much a function-oriented environment: there are no EC2 instances to manage or anything like that, you upload some code and that code is executed on reception of an event – then you just pay for the compute time used.

This is the “compute as a utility” concept taken more or less to its ultimate extreme: the problem that Amazon (and the others of that ilk) have in terms of provisioning sufficient compute is relatively well-known, and the price of EC2 is artificially quite high compared to where they would likely want to go: there just is not enough supply. The “end of Moore’s law” is partly to blame; we’re still building software like compute power is doubling every 18 months, and it just isn’t.

Fundamentally, efficiency is increasingly the name of the game, and in particular how to get hardware running more at capacity. There are plenty of EC2 instances around doing very little, there are plenty doing way too much (noisy neighbour syndrome), and what Amazon have figured out is that they’re in a pretty decent place to be able to schedule this workload, so long as they can break it down into the right unit.

This is where serverless comes in. I think that’s a poor name for it, because the lack of server management is a principle benefit, but it’s a side-effect. I would probably prefer macro-function oriented architecture, as a similar but distinct practice to micro-service oriented architecture. Microservices have given rise to discovery and scheduling systems, like Zookeeper and Kubernetes, and this form of thinking is probably primarily responsible for the popularity of Docker. Breaking monolithic designs into modular services, ensuring that they are loosely coupled with well-documented network-oriented APIs, is an entirely sensible practice and not in some small part responsible for the overall success Amazon have had following the famous Bezos edict.

Macrofunction and microservice architectures share many similarities; there is a hard limit on the appropriate scale of each function or service, and the limitation of both resources and capability for each feels like a restriction, but is actually a benefit: with the restrictions in place, more assumptions about the behaviour and requirement of such software can be made, and with more assumptions follow more powerful deployment practices – such as Docker. Indeed, Amazon Lambda can scale your macrofunction significantly – frankly, if you design the thing right, you don’t have to worry about scaling ever again.

However, one weakness Paul has rightly spotted is that this is early days: good practice is really yet to be defined, bad practice is inevitable and difficult to avoid, and people attempting to get the benefits now are also having to figure out the pain points.

It’s worth saying that this architecture will not be for everyone – in fact, if you don’t have some kind of request/response to hook into, frankly it won’t work at all – you’re going to find it very difficult to develop a VPN or other long-lived network service in this environment.

Many of the patterns that should be applied in this scenario are familiar to the twelve-factor aficionado. Functions should be written to be stateless, with persistent data discovered and recorded in other services; configuration is passed in externally; et cetera. Interestingly, no native code is supported – I suggest this is no surprise, given Amazon’s investment in Annapurna and their ARM server line. So, interpreted languages only.

A lot of this reminds me a lot of the under-rated and largely unknown PHP framework, Photon. While this is not immediately obvious – Photon’s raison d’etre is more about being able to run long-lived server processes, which is diametrically different to Lambda – the fundamental requirement to treat code as event-driven and the resulting architecture is very similar. In fact, it was very surprising to me that it doesn’t seem to be possible to subscribe a Lambda handler to an SQS topic – it’s possible to hack this via SNS or polling, but there is no apparent direct mechanism.

It’s difficult to disagree that this is the wave of the future: needing to manage fewer resources makes a lot of sense, being able to forget about security updates and the like is also a major win. It also seems unlikely to me that a Lambda-oriented architecture, if developed in the right way, could ever be much more expensive than a traditional one – and ought to be a lot less in practice.

On Hiring A-Players

“Steve Jobs has a saying that A players hire A players; B players hire C players; and C players hire D players. It doesn’t take long to get to Z players. This trickle-down effect causes bozo explosions in companies.” ― Guy Kawasaki

A few times recently I’ve bumped into what I call the “A-Player Theory”. This is a close relative of the “10x Engineer Theory”, and in its usual formulation states that only the best want to hire their equals or betters: everyone else, for whatever reason, hires down. If only you were brave enough to hire people better than you, you’d be creating great teams in no time!

Like a lot of ideas in the tech arena, it feels like this idea has come from the world of elite sports, and this is no bad thing. When I want to explore ideas about peak performance, it’s natural to look to different types of people and try to apply their ideas to my own. However, I don’t think that that this theory is actually that helpful, and it’s simple to explain why.

First, I think there is a simple problem with the formulation in that hiring is a two-way street: not only does the hirer need to want to find the best people, but the best people will need to be attracted to the hirer. I think this second effect is actually the more important: the quality of the talent available to you, as a hirer, will be directly proportional to the quality of your culture. This is a difficult statement to support in this post, and is something I will explore further in the future, but I do believe that this is the case.

But second, and more importantly, I just don’t think the theory holds in elite sports. Now granted, things are rather different in American elite team sports: for example, team-building is crucial in American Football, but processes like “the draft” are  deliberately designed to ensure that C-teams hire A-players (at least in the beginning), so that some degree of competitive parity is ensured.

I’d like to give some examples from the world of soccer, instead. I’ve chosen this sport for a few reasons – partly, I’m very familiar with it, but mainly because it’s a global sport with the requisite elite element, and sufficient money floating about that players move between teams, leagues and countries with extreme ease. If any teams were going to be calculating about hiring A-players, you’d see them in soccer.

To be fair, there are absolutely teams who do this. The most obvious example is the galáctico policy pursued by Real Madrid. However, this turns out to be pretty rare: teams will transfer players for a variety of different reasons, and rate them in different ways, and while there are large-money transfers for world-class players, these make up a small amount of activity at even the largest clubs as a rule.

The absolute best teams in soccer are not associated with world-class A-players; they’re actually a function of the manager. My favourite example right now is Leicester City: not least as I grew up in Leicester, but (correct at the time of writing this) because they’re flying high at the top of the Premiership with one of my favourite managers, Claudio Ranieri. Their leading scorers, with 36 goals between them, are those huge names “Jamie Vardy” and “Riyad Mahrez” (you will be forgiven if you don’t follow soccer for having never heard of the pair of them).

Vardy at least was a record signing: Leicester paid £1 million for him from a non-league club (and reportedly, the deal could even be worth more). Let’s be clear about this, though: for a premier league player, that sum is an absolute pittance. Mahrez came from a league club to be sure – Ligue 2 AAS Sarcelles – for an undisclosed fee, which again will be small in the scheme of the league (and in fact, both players joined before the team were promoted to the Premiership). Both players now absolutely have a market value in tens of millions.

Leicester did not sign A-players; in fact, they signed a bunch of low-league and non-league players in order to rise out of the Championship, and then made the very sensible decision not to attempt to parachute in Premiership stars, and instead stick with the team they had built. This work is largely attributable to their previous manager, Nigel Pearson, and does not particularly break the mold in many ways – this is a well-trodden path that has seen many clubs rise through the leagues at the hands of an excellent manager (e.g. Nottingham Forest in the Clough era, before which they were largely forgettable and underachieving, or Leeds United under Revie, who started as an awful side that struggled to attract youth players let alone professionals).

If anything, in fact, the record of teams within this league that have gone out to buy the best has been pretty awful – Chelsea have done well but at an absolutely enormous cost, and many teams inflated by A-players have quickly fallen once the money ran out (Newcastle, West Ham, as examples).

The manager with the most enviable record for team-building, of course, must be Alex Ferguson. While his best Manchester United teams were full of world-class players, many of whom had been bought in at some cost, his career was partly defined by the number of A-players he allowed to leave the club. Paul Ince was a huge loss to the club, as was Hughes and Kanchelskis, and (not for the first time in his career) Ferguson came under significant pressure to resign. Instead, he opted (having failed to sign some names) to bring in members of the youth team – names who are now instantly recognisable, like the Nevilles, Beckham, Scholes. Commentator Alan Hansen characterized their opening loss with the immortal words, “You can’t win anything with kids”. He also let go Cantona, and Ronaldo (who left at the height of his powers) brought an almost £70 million profit to the club upon sale. Sure, Ferguson also spent money (and brought in players who didn’t perform – Taibi, Djemba-Djemba, Tosic, Zaha, Bebe, to name but a few – some of them real A-players, like Andersen), but Manchester United were one of the most sustainably successful clubs for a long period of time under his leadership.

All of this brings me back to my central point. It’s not the player that is important; it’s the team – and the decision about who to bring into a team, when, and how, is the responsibility of the manager. It’s very easy to say “I’m going to go and hire only A-players!”, but actually, it’s probably one of the worst things you can do for a team. In football terms, you need balance in a side – people with different strengths, abilities and points of view. Football is notorious for great players who leave one club for another, and become totally anonymous shadows of their former selves: this is not because they’re B- or C-players in disguise, but because if a team doesn’t have a A-player shaped hole, that player will not be able to perform like an A-player.

It is incumbent on the best team leaders to develop the team first and foremost; the aim is that based on the performance of the team others will look back and say, “That is a team of A-players”. It’s easy to state how to do this, but remarkably difficult in practice:

  1. set the right culture for the team, right from the off. This is a whole topic in and of itself, and the culture creates the environment for performance but importantly does not trigger performance itself
  2. where an existing team is in place, plan out how to adapt and grow the people within the team, guided by the culture
  3. for each new member being added to the team, think hard about the gap they’re filling and whether or not they are the right “shape” to fill the gap.

Think about the levers in the team that are used to create and maintain performance. Most managers will automatically turn to metrics and reports; these are amongst the least powerful tools. The crucial factors here are ensuring clarity of purpose, adequacy of tools and resources, autonomy to perform and passion for the work. Passion burns most fiercely when fuelled by success, and as a team leader that is your end goal. Hiring is important, it’s best to start in the best place possible, but I don’t believe it’s anything more than a good start.

“The quality of a person’s life is most often a direct reflection of the expectations of their peer group.” ― Tony Robbins

Tech2020 followup

The various videos of the speakers from Tech2020 – including yours truly – are up and available for Skillsmatter members. Going back to my previous blog post, I can heartily recommend the speakers who I was excited about, but have to say, I was blown away by the overall quality of the conference. Even those topics I didn’t think would hold much interest or news for me turned out to be incredibly interesting, and I daresay the next editing of this conference will be something to watch out for.

Tech2020

For a while now, I have been waxing lyrical (to those who will listen) about the variety of new tools and analyses available to people who want to prognosticate. If nothing else, the current craze for data within most businesses has resulted in people almost literally swimming around in the stuff without an awful lot of an idea about what to do with it, and while this has lead to some unspeakably shambolic practices (those who know me will likely have heard me on my hobby horse about proving models with actual experimentation) it has also opened up new horizons for people like me.

So, I’m delighted to have been invited to give a talk I submitted to the Bartech Tech2020 conference, this coming week in London – the first meeting of this particular group, there is a great line-up of speakers, all of whom are going to be reading the runes and describing their vision of the year 2020. Wonderfully, the various talks will be recorded and available, so there will be significant opportunity come the year 2020 to look back and groan loudly at the errors and omissions that will have piled up nicely by that point.

There are some brilliant speakers lined up, and I have to confess to being eager to particularly hear from this lot (in no particular order):

  • Zoe Cunningham – the old refrain, “culture eats strategy for breakfast”, has never been more true than now. It’s also one of the most difficult things to set right and predict;
  • David Wood – working in healthcare, I’m incredibly interested in David’s talk, and am certain that what we will call healthcare in another ten year’s time will in many ways bear little resemblance to what is practised now;
  • Simon Riggs – in all honesty, I’m hoping he’s going to be talking at least in part about homomorphic encryption because I just read the Gentry paper recently and it’s fascinating, but there is so much to come in this space – particularly now that data is so large and non-local that all sorts of new strategies are needed.

I’m going to attempt to tweet through most of the conference in low volume, probably on #tech2020, and look forward to putting a few more faces to names from Bartech.

What I realised I’m missing from Gnome

Not that long ago, I did a switch on my Android phone: against all the promises I made to myself beforehand, I switched on the Google account and allowed it to sync up to GCHQ/NSA the cloud. I did this for one main reason: I had just got an Android tablet, and I despised having to do the same stuff on each device, particularly since they weren’t running the same versions of Android, and one was a Nexus – so not all the UI was the same. The benefits, I have to say, were pretty much worth it: I don’t have too much sensitive data on there, but the ease of use is incredible. What was particularly good was that when I broke my phone, and had to have a new one, once the new one was linked up everything was basically back how it was. That’s tremendously powerful.

Now, I recently acquired a bit of Apple equipment and of course installed Fedora 19 on it. Just to digress briefly: installing Fedora 19 on any new Mac hardware, particularly if you want to keep Mac OS X around (I don’t much care for OS X, but keeping it for now seems handy), is tremendously difficult. I had wired ethernet (brilliant, because I was using the netinstall – which, I should note, is a truly wonderful experience in the new Anaconda) which was lucky, since the wifi doesn’t work by default. The disk partitioning is incredibly complex, and the installation documentation is not particularly good. At some point I might try and help update the documentation, but it would feel a little like the blind leading the blind at this stage: although I have Fedora booting, the Mac OS X grub entries don’t work.

Logging into my desktop, though, I realised everything was bare. This was not like the Android experience at all – everything, from my username to my dot config files, needed to be set up again. I rarely change hardware, and previously I saw this as a reason to make a fresh start of things: but actually, now I value the convenience more highly.

It’s not like things are totally bad:

  • Gnome’s account settings can pull in some limited information, from Google or OwnCloud or other similar systems
  • Apps like Firefox have excellent built-in secure synchronisation that’s not a complete pain to set up
  • you can use apps like SparkleShare to make specific directories available elsewhere.

However, this really isn’t the experience I want:

  1. I should be able to use some online “Gnome Account” in the same way I can set up Enterprise Login during install
  2. That “Gnome Account” should have all my key configuration, including the details of other accounts I have linked up (maybe not the passwords, but at least the settings)
  3. If I have online storage / backup somewhere, it should offer to sync that up
  4. I should be able to sync my entire home data, not just specific bits
  5. If the two machines are on, I should be able to access one from the other – even if there’s a firewall in the way

I realise point five above is particularly moon-on-a-stick territory.

Technically speaking, a lot of the basic bits are kind of there, one way or another. Most Gnome apps use the standard dconf settings system, and in theory it’s possible to synchronise that stuff where it makes sense (this is, of course, handwaving: whether or not you want all settings exactly the same on each machine is virtually an impossible question to answer). Discovering and syncing other data shouldn’t be that hard. Remote access to another machine is definitely much harder, but the various protocols and discovery mechanisms at least exist.

Annoyingly, there doesn’t seem to be much development in this direction – not even prototypes. There are lots of nasty problems (syncing home directories is fraught with danger), and even if you were willing to buy into a simpler system to get the goodies, making it work in Gnome is probably an awful lot easier than dealing with the other apps that aren’t Gnome aware.

I’m certainly not in much of a position to start developing any of this stuff right now, but it would be interesting to at least attempt to draw out a believable architecture.  A decent 70 or 80% solution might not even be too hard to prototype, given the tools available. It would be interesting to hear from anyone else who is working on this, has worked on it, or knows of relevant projects I’ve missed!

 

A first look at docker.io

In my previous post about virtualenv, I took a look at a way of making python environments a little bit more generic so that they could be moved around and redeployed at ease. I mentioned docker.io as a new tool that uses a general concept of “containers” to do similar things, but more broadly. I’ve dug a bit into docker, and these are my initial thoughts. Unfortunately, it seems relatively Fedora un-friendly right now.

The first thing to look at is to examine what, exactly, a “container” is. In essence, it’s just a file system: there’s pretty much nothing special about it. I was slightly surprised by this; given the claims on the website I assumed there was something slightly more clever going on, but the only “special sauce” is the use of aufs to layer one file system upon another. So from the point of view of storage alone, there really isn’t much difference between a container and a basic virtual machine.

From the point of view of the runtime, there isn’t an awful lot of difference between a virtual machine and a container either. docker sells itself as a lightweight alternative to virtual machines, but of course there is no standard definition of a “virtual machine”. At one end of the spectrum are the minimal hardware OSen that can be used to assign different host resources, including CPU cores, to virtual machines, and those types of VM are effectively not much different to real hardware – the configuration is set on the fly, but basically it’s real metal. On the other end of the spectrum you have solutions like Xen, which make little to no use of the hardware to provide virtualisation, and instead rely on the underlying OS to provide the resources that they dish out. docker is just slightly further along the spectrum than Xen: instead of using a special guest kernel, you use the host kernel. Instead of paravirtualisation ops, you use a combination of cgroups and lxc containers. Without the direct virtualisation of hardware devices, you don’t need the various special drivers to get performance, but there are also fewer security guarantees.

There are a couple of benefits of docker touted, and I’m not totally sold on all of them. One specific claim is that containers are “hardware independent”, which is only true in a quite weak way. There is no specific hardware independence in containers that I can see; except that docker.io only runs on x86_64 hardware. If your container relies on having access to NX bit, then it seems to me you’re relying on the underlying hardware having such a feature – docker doesn’t solve that problem.

The default container file system is set up to be copy-on-write, which makes it relatively cheap diskspace-wise. Once you have a base operating system file system, the different containers running on top of it are probably going to be pretty thin layers. This is where the general Fedora un-friendliness starts, though: in order to achieve this “layering” of file systems, docker uses aufs (“Another Union File System”), and right now this is not a part of the standard kernel. It looks unlikely to get into the kernel either, as it hooks into the VFS layer in some unseemly ways, but it’s possible some other file system with similar functionality could be used in the future. Requiring a patched kernel is a pretty big turn-off for me, though.

I’m also really unsure about the whole idea of stacking file systems. Effectively, this is creating a new class of dependency between containers, ones which the tools seem relatively powerless to sort out. Using a base Ubuntu image and then stacking a few different classes of daemon over it seems reasonable; having more than three layers begins to seem unreasonable. I had assumed that docker would “flatten out” images using some hardlinking magic or something, but that doesn’t appear to be the case. So if you update that underlying container, you potentially break the containers that use it as a base – it does seem to be possible to refer to images by a specific ID, but the dockerfile FROM directive doesn’t appear to be able to take those.

The net result of using dockerfiles appears to be to take various pieces of system configuration out of the realm of SCM and into the build system. As a result, it’s a slightly odd half-way house between a Kickstart file and (say) a puppet manifest: it’s effectively being used to build an OS image like a Kickstart, but it’s got these hierarchical properties that stratify functionality into separate filesystem layers that look an awful lot like packages. Fundamentally, if all your container does it take a base and install a package, the filesystem is literally going to be that package, unpacked, and in a different file format.

The thing that particularly worries me about this stacking is memory usage – particularly since docker is supposed to be a lightweight alternative. I will preface this with the very plain words that I haven’t spent the time to measure this and am talking entirely theoretically. It would be nice to see some specific numbers, and if I get the time in the next week I will have a go at creating them.

Most operating systems spend a fair amount of time trying to be quite aggressive about memory usage, and one of the nice things about dynamic shared libraries is that they get loaded into process executable memory as a read-only mapping: that is, each shared library will only be loaded once and the contents shared across processes that use it.

There is a fundamental difference between using a slice of an existing file system – e.g., setting up a read-only bind mount – and using a new file system, like an aufs. My understanding of the latter approach is that it’s effectively generating new inodes, which would mean that libraries that are loaded through such a file system would not benefit from that memory mapping process.

My expectation, then, is that running a variety of different containers is going to be more memory intensive than a standard system. If the base containers are relatively light, then the amount of copying will be somewhat limited – the usual libraries like libc and friends – but noticeable. If the base container is quite fat, but has many minor variations, then I expect the memory usage to be much heavier than the equivalent.

This is a similar problem to the “real” virtual machine world, and there are solutions. For virtual machines, the same-page mapping subsystem (KSM) does an admirable job in figuring out which sections of a VM’s memory are shared between instances, and evicting copies from RAM. At a cost of doing more compute work, it does a better job that the dynamic loader: shared copies of data can be shared too, not just binaries.  This can make virtual machines very cheap to run (although, if suddenly the memory stops being shareable, memory requirements can blow up very quickly indeed!). I’m not sure this same machinery is applicable to docker containers, though, since KSM relies on advisory flagging of pages by applications – and there is no application in the docker system which owns all those pages in the same way (for example) qemu would do.

So, enough with the critical analysis. For all that, I’m still quite interested in the container approach that docker is taking. I think some of the choices – especially the idea about layering – are poor, and it would be really nice to see them implement systemd’s idea of containers (or at least, some of those ideas – a lot of them should be quite uncontroversial). For now, though, I think I will keep watching rather than doing much active: systemd’s approach is a better fit for me, I like the additional features like container socket activation, and I like that I don’t need a patched kernel to run it. It would be amazing to merge the two systems, or at least make them subset-compatible, and I might look into tools for doing that. Layering file systems, for example, is only really of interest if you care a lot about disk space, and disk space is pretty cheap. Converting layered containers into systemd’able containers should be straightforward, and potentially interesting.