The ongoing poverty of the Docker environment

I spent a few hours this weekend attempting to re-acquaint myself with the Docker system and best practices by diving in and updating a few applications I run. I wrote up an article no long after Docker’s release, saying that it looked pretty poor, and unfortunately things haven’t really changed – this doesn’t stop me using it, but it’s a shame that the ecosystem apparently has learnt nothing from those that came before.

There are, of course, certain things you can do to make your life easier: choosing Debian Jessie as a base apparently isn’t one of them. If you’re planning on launching node.js applications you have an entire world of pain to look forward to; my advice is to use ‘nodesource/jessie’ as a base and pretend all the stuff underneath doesn’t exist – attempting to use either the default Debian node or manage the installation of nodesource yourself just isn’t worth the hassle.

But anyway. The main concern is that the build process is so poor. You pretty much have a single environment which is going to serve has both build and runtime by default, and looking around at the various public containers you can see most of them are bloated with the flotsam and jetsam of the build process. Building still requires root by default too, and the vast majority of example Dockerfiles out there slap binaries together in this manner that is entirely reminiscent of wattle and daub plasterwork.

A great example of this is the build container or toolbox pattern. Most people don’t want an environment to serve for both build and runtime, so they separate the two and ensure that one container is used to create (build) the artefacts required at runtime, so as keeping the dependency chain separate and reducing the size of the output being pushed into production.

However, Docker gives you literally no tools to manage this cleanly. There is no build pipeline, so you have to create it yourself. How are you going to transfer the build artefacts from one container to another? You can create a shared volume between the containers, but that makes deployment unnecessarily complicated. You can also use docker cp to move artefacts out of a built image. My personal choice is to extract it using docker run, like this:

This works pretty great, except that you cannot integrate it into the actual Docker build process: you cannot express a dependency from the build container to the app container, and you cannot pull in the build artefacts. So, I end up with some external tool, like make. You don’t get the full benefit of dependency caching, though, because Docker only has a concept of caching layers. Also, with multiple builds happening, your working directory grows with the various build artefacts, and every time you start a new build it’s sending a bunch more data to the Docker daemon (bizarrely, even now, it’s common place to build apps as root and forward the data to the main daemon to produce an image – although bubblewrap is close to getting to a proper solution again it seems).

The Dockerfile build process itself is also super-simplistic. I suspect this is why people get started quickly with it – although I don’t think having a shallow learning curve in general precludes a process from being technically good. It’s essentially throwing together a packaged Linux system using a simple shell script, like the rc files of old. It’s like dpkg and rpm were never invented; and all the problems you would expect in terms of rolling your own are there: the functional atoms (copy/add a file, etc.) are basic.

A lot of Docker devotees claim that Docker removes the need for system configuration management. This is sadly untrue: it’s there, and you have to do it using the basic tools that Dockerfile gives you. You’ll often see a lot of sed and awk in Dockerfiles, or plain overwriting system files, because none of the finer-grained tools like ansible/augeus and friends are there by default (although I note you can now build Docker containers with ansible – which is something I intend to try, as it is a much more reasonable approach).

There are some similar systems out there which highlight the major difference in quality. flatpak and associated tooling is much more influenced by traditional package management and systems concerns, and it shows – it’s much better thought-through, with a full theory on how to divide up the applications and a build process to match.

I don’t personally understand why this tool has been built in this way, given all the good stuff already created and available with standard packaging tools. Linux packages are already very very close to the Docker concept of layers, and Dockerfiles are quite reminiscent of ancient versions of Kickstart. But, it misses all the stuff you’d take for granted as a Linux packager – clean build roots (the layer caching basically prevents this, but turning it off is an exercise in pain), user-mode building, build artefact and process management, etc. etc.

You might be asking at this point why I’m continuing to bother with Docker. Popularity is part of the answer – it’s a tool you have to know at this point, even though some of the time it feels like we’re back to the sysadmin stone age.

But that would be a bit trite. The real answer is that there are a couple of things that Docker has definitely got right. The main one isn’t due to Docker itself, but the concept of containers being largely stateless as a design process is powerful: I think Linux packaging could learn from this, rather than attempting to manage configuration in the way it does (and I think flatpak is a route forward here). Removing configuration and storage from scope allows for some interesting design choices, although these are largely for naught if you don’t have control over the software underlying (actually, it can make things more complex – but if you’re writing software, it encourages much better design practices, and passing secrets consistently through the environment is somewhat reasonable if not ideal).

The second one is that the contract or API between container(s) and the host is network ports (for the most part – you can of course do file system tricks and all sorts). This turns out to be a surprisingly good point of abstraction. I kinda disagree with the “one container, one process” dictum because it largely breaks this: for me, it should be “one container, two services”, where one of those services is a heartbeat. This is a good unit of composition, and frankly I don’t care whether service is one process or many, or threaded or whatever. And – to be clear – service doesn’t mean “self contained web application” to me, the various all-in-one containers for popular apps are common container disaster areas; my position is only a slightly nuanced version of the accepted best practice.

Now, the networking is far from perfect – years-old issues for basic functionality like being able to address the IP of the docker host go unsolved without hacks – and various “solutions” in general have been deprecated. Many orchestration tools, in the mean time, have gone forward and created their own solution to the networking conundrums people face, yet there is still no standard solid service discovery solution.

I personally think Docker is not long for this world. I do think, though, that a lot of the basic concepts are going to go forward – probably with smaller, more capable tools that better incorporate learning from traditional systems management – and like the world of Javascript, it’s pretty cool to have the orthodoxy turned upside down once in a while.

Security by Design For Everyone