C[IT]O’s Guide to Serverless Costs

My estimable Twitter-pal Paul Johnson has put together a very reasonable thread about his thinking on serverless costs (ie. AWS Lambda, in this case). He makes a great case for the design of functions being done in such a way as to allow cost efficiency improvements, and I think the point on architecture is generally well-made. However, there are a few aspects of this which I think are generally not well understood, and Twitter is much too short a form to get them in. Hence this post.

How is Lambda priced?

This is a simple question with a relatively complex answer. We know that we pay per-request, and people often think that you pay for how much compute you use. That’s not quite true: technically, you’re paying for how long you’re using RAM for. So, the cost has three scales: number of requests, RAM allocated, and time spent handling the requests.

Just to make things complex there’s a free tier for everyone which gives you certain capacity before you begin to need to pay. This gives you a certain number of requests, and a certain amount of memory-time, Gb-s (Gigabyte-seconds: a measure of how much RAM you use, for how long). The memory-time is charged to the nearest 100ms per-request, the minimum RAM allocation is 128Mb, so the minimum cost of a request is $0.000000208 – a very small amount indeed!

How does this stack up to EC2? Well, an On-demand t2.micro instance costs $0.0116 per hour with a 1G RAM allocation. To handle a similar 100ms/128Mb RAM request, that’s 0.0116 / (1024/128) * (60*60*10) = $0.00000004028. In other words, this is one fifth of the Lambda cost.

(If you spot any problems in the numbers, please let me know on Twitter!)

The premium for Lambda varies as you change the size of either the Lambda allocation or the EC2 size; probably x4 is a more useful rule-of-thumb. But “useful” is pushing it, because….

Comparing Apples & Oranges

I can already hear the Lambda fans in the audience screaming at me that I can’t compare the two services like that. That’s an entirely fair comment – EC2 and Lambda don’t function at all similarly. With EC2 I’m paying for at least an hour, and I’m paying for all the time I’m not using it. I’m also unlikely to be able to max it out – getting to a 60% duty cycle on utility would be pretty good, so even here it feels like there’s very little premium on Lambda even on a busy service.

However, EC2 has two exceptionally useful properties here. First, this is a capped expense: I know how much I’m going to pay at the end of the month. With Lambda, I need to keep an eye on the meter to make sure something silly isn’t happening (and AWS’ natural limits here help out enormously). If this was a restaurant, EC2 is the fixed-price menu and Lambda is a la carte: if I hit the limit on EC2, my service begins to degrade.

The second useful property is that EC2 multi-tasks: my instance can be doing many things at once. Lambda cannot; it has a single request to service and any other work I manage to do is pretty much incidental. This means that EC2 is much better suited to some types of workload.

It’s often said with Serverless that you only pay for what you use. This is only true in-between requests: when a request is happening, you pay for what you’re not using. An obvious example would be a request to an external service like a DB or S3: I’m paying while I wait for that request to come back, even though my function is idle. For an I/O heavy application (by which I mean, “spends a lot of time waiting for I/O” – not an application like a database, which puts load on the I/O), we’re paying for a bunch of time we’re not using.

Serverless & Microservices

This point about I/O is important with microservice or microfunction-style architectures. If I have a lot of functions-calling-other-functions, I’m becoming more I/O heavy, and the cost of my functions increases. Depending on how deep my layers are, and whether I can issue the function calls asynchronously, I’m going to start incurring that I/O cost in more than one place.

As an example, let’s say I have a user account service which allows me to query information about the currently authenticated user. To get back their e-mail address and contact preferences takes 250ms in this example. If I have another service to upload files, it calls into this user service and spends 250ms waiting for the response. So, I’m paying for the 250ms on my User function, but I’m also paying for the 250ms wait in my Upload function.

Chaining callbacks deeply into Lambda functions in a synchronous way like this is a bit of a cost anti-pattern: the more I use my Upload function, the more I’m paying for that duplication. In the EC2 example, I’m not paying for that I/O wait because my function/service literally isn’t running – in principle, I can stack many tens of thousands of requests on a single box, because waiting for I/O is an exceptionally low-overhead problem.

Scaling costs of Serverless

Proponents of Serverless like to talk about how simple the scaling is, and how clear the cost is – one request, one cost. As you become more successful and your application is doing more work, yes, it incurs more cost – and that should be fine, right? It’s linear, it’s predictable.

This is a major change to the cost structure of IT, and not necessarily for the better. For as long as industry has been around, and increasingly successful service has borne additional costs, but usually they are sub-linear: bigger boxes are cheaper proportionately, I get bigger discounts on more traffic, etc. The unit cost per user comes down very quickly.

In the Lambda world, this isn’t true. My application has a unit cost per user (roughly speaking – obviously it’s not identical), and as my userbase grows the cost grows with it in lockstep. The cost starts at zero, which is a nice feature, but as a larger user I’m actually more interested in the cost at scale.

From an AWS point of view, I definitely understand the appeal of this pricing model, because it basically puts you in direct contact with RAM utility – and encourages you to reduce it. And as Paul points out in his thread, once you separate functions out you can begin to RAM-optimize each of them. I would argue that it’s wrong to structure functions in order to create optimization boundaries, but clearly memory-intensive work can be hived off with relative ease.

From a user’s point of view, this is probably all still a win, assuming you have the right workload. For compute-heavy services it’s a no-brainer; and even if you have a lot of I/O-heavy services it may still make a lot of sense if they’re not regularly used. But, for hot I/O services, the Lambda pricing is going to grow at a much quicker rate.

For the most part, Lambda-based applications I’ve seen so far have tended to operate on a cheaper basis than EC2, but I think this is for two reasons:

Lambda is still relatively new mindset-wise, and therefore the applications written for it tend to be newer and smaller;
EC2 suffers from people re-implementing data centre practices: they use redundant instances, don’t use the elastic features, use spot instances, etc. There’s a whole cottage industry in this area, and Lambda neatly side-steps the problem.

Best Practices

I don’t disagree with anything that Paul wrote in terms of the architectural upshot of the costs. However, as well as looking to optimize RAM within your functions, I would say you also want to optimize compute and I/O.

For example, if your functions take 25ms to complete, it makes sense to bundle them than pay four times the cost for each invocation. It makes sense to attempt to limit the requests that your functions make, and ideally call into each other only when necessary. I’m not sure what level of aggregation is best (there probably isn’t a single answer), but I would argue that they should be a lot more complex than a function in your code would be. Aggregation also means you have fewer functions more frequently used, which helps keep them warm. A FaaS function should be somewhere in between an application and a function in code terms, and probably a lot closer to the function end of the scale.

If possible, if you’re making requests from a Lambda function, try to do them in a scatter/gather style: create them all in an asynchronous manner, and only wait for the slowest of them. This can be difficult to achieve – if you’re making decisions based on the responses, it means you likely get into ahead-of-time computation that never feels very natural – but for some functions this may be a significant part of the cost.

This is all another reason why I think Lambda and containers will merge together in the future: containers look a lot like processes, as do functions, and the cost basis for them comes down to scheduling resources. Traditionally, that would be the role of the operating system: so in a very real sense, Kubernetes et al are the new OS, and Lambda is the new application server.

It would be very helpful to have within Lambda a request-termination setup that didn’t translate directly into runtime cost. Load balancing/request termination is generally a pretty small part of the cost, and being able to run an entire request via queues rather than requests would be extremely powerful. It wouldn’t remove the I/O cost of calling into external services, but it would remove the cost of calling internally. One of my favourite non-mainstream HTTP features is Mongrel2‘s ability to extend the asynchronous processing outside of the webserver via ZeroMQ – enabling exactly this approach.

Security by Design For Everyone