Everyone is now talking about the CPU security problems that are now being fully disclosed: they’re dubbed Meltdown and Spectre. Meltdown is a problem that mainly or entirely affects Intel CPUs, but Spectre is a problem that affects all designs.

I haven’t seen any “explain it like I’m 5” on the Spectre paper yet, so here’s my take. Sadly, it’s not 5-year-old level, but I’ve tried to make it a bit more accessible. If you want a lot more detail, the Google blog has code.

Let’s talk a bit about modern CPU architecture. For a while now, the internal processing of CPUs has been vastly faster than everything else in the computer – including the memory. Some tricks can be done – such as including very fast memory within the CPU (called a cache) to keep commonly needed data to hand. But for the most part, CPUs are now so fast that they often spend their time basically twiddling their thumbs waiting for the rest of the system to catch up.

Instead of leaving CPUs idle like that, most manufacturers have included – for many years now – “speculation”. This allows a CPU to start working on problems before it knows whether they’re needed or not.

Meltdown

There are a number of different types of speculation. “Out of order” execution allows a process to execute instructions in a different order to those listed. Sometimes, like in a recipe, it doesn’t matter the order you do things in. Other times, it does, and you have to start over.

Meltdown attacks this out-of-order execution. Combined with an actual bug in Intel CPUs, it executes code which the CPU processes in the wrong order. The CPU accesses memory that should not be available, realises this, and tries to go back and start over. Unfortunately, it leaves some of the results of the first attempt around – and this combination of speculation and inappropriate access apparently allows parts of the memory for the operating system to be read.

This is a direct attack, albeit one which is pretty slow to execute. This is the problem that most of the software patches are trying to address – the solution is to hold more of the operating system at “arm’s length”. This is going to result in less performance, though, affecting software which uses the operating system more extensively (anything which uses a lot of data particularly).

Spectre

This brings me to the “unpatchable” attack, Spectre. This supposedly affects most if not all modern CPUs from all vendors, although it’s also more difficult to exploit.

Spectre also attacks speculative features, this time “branch prediction”. This feature allows the CPU to start executing code which many not be correct to run, based on a decision it has yet to get an answer for. Given a decision to go one way or another, the CPU will essentially guess (based on previous results) the direction before it knows for sure which way to go.

It’s as if I ask you what type of sandwich you want, and start making you a cheese sandwich before I get the answer. If you answer that you want a cheese sandwich, then it’s a win – I’ll get it finished quickly because I’m already making it. If you wanted ham on rye I’m going to have to throw away what I did so far, though.

From your perspective, there is an interesting information leak here. If you know I make sandwiches speculatively, then you can tell if I guessed right by how quickly I give you your sandwich. This type of difference when the code is running is usually called a side-effect.

The main approach described in the Spectre paper, as I understand it, is two-fold: first, get the processor to speculatively read some piece of information from memory (let’s call this value k), and then jump to some area of memory at an address related to k (let’s call this M). The CPU realises that this is wrong, and throws the results away, but at this late stage the information in the data from M has been loaded into the CPU’s fast cache. The attacker can then try reading from a series of different addresses to find out which one is really “M” by seeing how quickly they are found: if a read for an address comes back quickly, then it’s M (since the read must have been in the cache), and because the attacker knows the address for M now it has been effectively given k. It’s a bit more complex than that, and there are some variants and requirements, but that’s the gist.

I’ve tried to think of ways of simplifying how to explain this, and sadly I can’t. The basic problem, though, is this: the CPU has been tricked into retrieving information that the attacker shouldn’t be able to access, and the attacker can test the CPU cache in a few different ways to deduce what the information is. This is called a “side channel” – the information we want can’t be read directly, but it can be leaked out another way.

Here’s why it’s unpatchable

Software developers learned a long time ago that performance can give away information. String comparison was a favourite: I can see that “banana” and “sheep” are different words just by looking at the first character, but I have to get six characters in to see the difference between “planet” and “planed”. I shouldn’t test passwords like this, because if I quickly return an answer if the first character isn’t correct, it’s possible to guess the password. As I guess closer to the correct password, the result takes longer to compute – it’s like the computer’s saying “hotter” or “colder” each time as I get closer or not.

So it is with this problem. To make the CPU go fast, it has a variety of tricks which allow it to take short-cuts that sometimes work. Unfortunately, we can get information about what’s going wrong from timing and other side-effects of the processor. The information leaks because short-cuts are taken, but a significant amount of performance of modern CPUs comes from being able to often take the short-cuts.

It’s not a bug, it’s a design “flaw”. It’s not even a flaw, it’s just a logical conclusion of how CPUs work. It’s going to be very interesting to see how vendors of CPUs and operating systems eventually address this.The solution will likely involve better-specified CPUs: right now, various instructions don’t have a full list of side-effects. There are also no obvious ways to turn off speculation when it’s dangerous.

For the most part, this problem is only an issue at the interface between a program and its operating system.

Should we be worried?

Of the two attacks, Meltdown is the one that needs immediate attention. I’ve seen reports that the state of the random number generator can be read, for example – this is terrible for encryption. Ensuring that your systems are patched for this is important. Performance will be hit, but only a few workloads will be measurably impacted.

Spectre is much more difficult to execute, and fundamentally much more difficult to address. It’s going to be important to keep our eyes on progress here, though – often the first cracks in a system are quickly followed up by larger ones.