Locking revisited

Sorry – another slightly technical post. This is yet again another visit to the subject of locking, which I briefly described the problems with a while ago, and then wrote some new code to help alleviate the problems.

One thing I didn’t explain at all, looking back at those posts, is the different reasons for locking. We can simplify this down to two major reasons:

Internal database consistency. SQL databases are usually “ACID compliant“, and SQLite is no different. This means that every reader and writer gets a consistent view of the database; for example, if you’re writing data to the database, you can’t ever run a query which gets back partially-written data – you either see the data in its complete state, or you see nothing at all.
External store consistency. A bit like internal views, we don’t want clients accessing the store to see “inconsistent” data – we don’t want one client to be writing a mail to a user’s Inbox, and have another client read the half-written data and think it has the whole mail.

Those problems seem very similar, and indeed, our previous issues are mostly to do with the fact that those problems were conflated: the SQLite locking was used to ensure external consistency in the store. That’s actually a really common – and often desirable – way of doing things. The way it works is you wrap all your SQL operations in an SQL transaction, and whether or not your operation (e.g., writing a file) succeeds stands or falls on whether or not the transaction succeeds. The database guarantees that when you use a transaction, you can make many alterations at once but no other database client gets an intermediate view of the partially-written data.

However, at this point, we run into an issue with SQLite. “Proper” SQL servers give you the ability to lock single tables, or even rows within tables, to help ensure that transactions succeed where-ever possible. SQLite doesn’t give us that, it basically ensures consistency by locking the whole database. This makes the external consistency above suddenly very expensive to implement at the SQL level.

Now, surprisingly, there is also another set of locks within the store. At the moment, for example, collections are locked at a higher level when you read/write to them, because we also access other data – document contents on disk, the search index, etc. These locks are needed for the external consistency also, and almost make the SQL locks redundant.

So, my work on the store now is mainly to decouple the SQL-level locking from the provision of external consistency. We’ll effectively only use the locks at that level specifically when we’re reading/writing from the database to ensure that the data is written consistently when it matters (for example, allocating IDs) but not when it doesn’t (for example, creating a document and then later adding the correct metadata to it). External consistency will then be provided by the higher-level locks which we already use.

Security by Design For Everyone