Alex Hudson

Thoughts on Technology, Product, & Strategy

Month: April 2016

Containing incestuousness

Having droned on a little the other day about duplication in Stackanetes (in hindsight, I had intended to make a “it’s turtles all the way down” type jibe), I’ve been delighted to read lots of other people spouting the same opinion – nothing quite so gratifying as confirmation bias.

Massimo has it absolutely right when he describes container scheduling as an incestuous orgy (actually, he didn’t, I just did, but I think that was roughly his point). What is most specifically obvious is the fact that while there is a lot of duplication, there isn’t much agreement about the hierarchy of abstraction: a number of projects have started laying claim to be the lowest level above containers.

It comes back to this; deploying PaaS (such as Cloudfoundry, which I try to hard to like but seems to end up disappointing) is still way too hard. Even deploying IaaS is too hard – the OpenStack distros are still a complete mess. But while the higher level abstractions are fighting it out for attention, the people writing tools at a lower level are busy making little incremental improvements and trying to subsume new functionality – witness Docker Swarm – they’re spreading out horizontally instead of doing one thing well and creating a platform.

I don’t think it’s going to take five years to sort out, but I also don’t think the winner is playing the game yet. Someone is going to come along and make this stuff simple, and they’re going to spread like wildfire when they do it.

Stackanetes

There’s a great demo from the recent OpenStack Summit (wish I had been there):

OpenStack is a known massive pain to get up and running, and having it in a reasonable set of containers that might be used to deploy it by default is really interesting to see. This is available in Quay as Stackanetes, which is a pretty awful name (as is Stackenetes, and Stackernetes, both of which were googlewhacks earlier today) for some great work.

I’m entirely convinced that I would never actually run anything like this in production for most conceivable workloads because there’s so much duplication going on, but for those people with a specific need to make AWS-like capability available as a service within their organisation (who are you people?!) this makes a lot of sense.

I can’t help but feel there is a large amount of simplification in this space coming, though. While “Google for everyone else” is an interesting challenge, the truth is that everyone else is nothing like Google, and most challenges people face are actually relatively mundane:

  • how do I persist storage across physical hosts in a way that is relatively ACID?
  • how do I start resilient applications and distribute their configuration appropriately?
  • how do I implement my various ongoing administrative processes?

This is why I’m a big fan of projects like Deis for businesses operating up to quite substantial levels of scale: enforcing some very specific patterns on the application, as far as possible, is vastly preferable to maintaining a platform that has to encompass large amounts of functionality to support its applications. Every additional service and configuration is another thing to go wrong, and while things can made pretty bullet-proof, overall you have to expect the failure rate to increase (this is just a mathematical truth).

CoreOS in many ways is such a simplification: universal adoption of cloud-config, opinion about systemd and etcd, for example. And while we’re not going to go all the way back to Mosix-like visions of cluster computing, it seems clear that many existing OS-level services are actually going to become cluster-level services by default – logging being a really obvious one – and that even at scale, OpenStack-type solutions are much more complicated than you actually want.

 

 

Some notes on Serverless design: “macro-function oriented architecture”

Over the past couple of days I’ve been engaged in a Twitter discussion about serverless. The trigger for this was Paul Johnston‘s rather excellent series of posts on his experiences with serverless, wrapped up in this decent overview.

First, what is serverless? You can go over and read Paul’s explanation; my take is that there isn’t really a great definition for this yet. Amazon’s Lambda is the canonical implementation, and as the name kind of gives away, it’s very much a function-oriented environment: there are no EC2 instances to manage or anything like that, you upload some code and that code is executed on reception of an event – then you just pay for the compute time used.

This is the “compute as a utility” concept taken more or less to its ultimate extreme: the problem that Amazon (and the others of that ilk) have in terms of provisioning sufficient compute is relatively well-known, and the price of EC2 is artificially quite high compared to where they would likely want to go: there just is not enough supply. The “end of Moore’s law” is partly to blame; we’re still building software like compute power is doubling every 18 months, and it just isn’t.

Fundamentally, efficiency is increasingly the name of the game, and in particular how to get hardware running more at capacity. There are plenty of EC2 instances around doing very little, there are plenty doing way too much (noisy neighbour syndrome), and what Amazon have figured out is that they’re in a pretty decent place to be able to schedule this workload, so long as they can break it down into the right unit.

This is where serverless comes in. I think that’s a poor name for it, because the lack of server management is a principle benefit, but it’s a side-effect. I would probably prefer macro-function oriented architecture, as a similar but distinct practice to micro-service oriented architecture. Microservices have given rise to discovery and scheduling systems, like Zookeeper and Kubernetes, and this form of thinking is probably primarily responsible for the popularity of Docker. Breaking monolithic designs into modular services, ensuring that they are loosely coupled with well-documented network-oriented APIs, is an entirely sensible practice and not in some small part responsible for the overall success Amazon have had following the famous Bezos edict.

Macrofunction and microservice architectures share many similarities; there is a hard limit on the appropriate scale of each function or service, and the limitation of both resources and capability for each feels like a restriction, but is actually a benefit: with the restrictions in place, more assumptions about the behaviour and requirement of such software can be made, and with more assumptions follow more powerful deployment practices – such as Docker. Indeed, Amazon Lambda can scale your macrofunction significantly – frankly, if you design the thing right, you don’t have to worry about scaling ever again.

However, one weakness Paul has rightly spotted is that this is early days: good practice is really yet to be defined, bad practice is inevitable and difficult to avoid, and people attempting to get the benefits now are also having to figure out the pain points.

It’s worth saying that this architecture will not be for everyone – in fact, if you don’t have some kind of request/response to hook into, frankly it won’t work at all – you’re going to find it very difficult to develop a VPN or other long-lived network service in this environment.

Many of the patterns that should be applied in this scenario are familiar to the twelve-factor aficionado. Functions should be written to be stateless, with persistent data discovered and recorded in other services; configuration is passed in externally; et cetera. Interestingly, no native code is supported – I suggest this is no surprise, given Amazon’s investment in Annapurna and their ARM server line. So, interpreted languages only.

A lot of this reminds me a lot of the under-rated and largely unknown PHP framework, Photon. While this is not immediately obvious – Photon’s raison d’etre is more about being able to run long-lived server processes, which is diametrically different to Lambda – the fundamental requirement to treat code as event-driven and the resulting architecture is very similar. In fact, it was very surprising to me that it doesn’t seem to be possible to subscribe a Lambda handler to an SQS topic – it’s possible to hack this via SNS or polling, but there is no apparent direct mechanism.

It’s difficult to disagree that this is the wave of the future: needing to manage fewer resources makes a lot of sense, being able to forget about security updates and the like is also a major win. It also seems unlikely to me that a Lambda-oriented architecture, if developed in the right way, could ever be much more expensive than a traditional one – and ought to be a lot less in practice.