Alex Hudson's Blog

Fri, 01 Aug 2008

New query builder

So, the efforts of the past few weeks have started to bear fruit. Part of this has been checked into SVN, some will be checked in later today, but we're definitely beginning to get there.

I thought it would be interesting to look at the problem from the point of view of what the code is doing now. If you recall, the basic problem is this: we want to run an SQL query against our data to return a list of documents, but we then want to be able to refine the data on a number of different criteria.

The basic SQL query looks like this:

SELECT 
	so.guid, so.collection_guid, so.imap_uid, so.filename, 
	so.type, so.flags, so.size, so.time_created, so.time_modified 
FROM
	storeobject so;

This is one of those times where I wish my blog did syntax highlighting. But, anyway: let's think about this as a non-programmer. The best analogy is that the above query is pizza dough. What happens is that the store client is telling us what toppings it wants, and the store is having to figure out the right order of doing things - you don't want the tomato sauce on top of everything else, the pizza wouldn't work.

So, what kinds of toppings do our clients want? Let's look at a few examples. First up, an easy one - looking for mail documents, which have a 'type' of '2'. The client will tell the store this requirement via the Bongo Query Language, which looks like this:

= nmap.type 2

Simple enough. It's just infix notation for "nmap.type = 2". This changes the SQL query to:

SELECT 
	so.guid, so.collection_guid, so.imap_uid, so.filename, 
	so.type, so.flags, so.size, so.time_created, so.time_modified 
FROM 
	storeobject so 
WHERE 
	(so.type = 2);

I've italicised the new part of the query. It's only a little extra WHERE clause, although it's had to figure out that "nmap.type" is actually the "so.type" column in our query. What if the client tests a piece of data which isn't in the standard query?

= nmap.conversation.sources 2

... which turns into this SQL:

SELECT
	so.guid, so.collection_guid, so.imap_uid, so.filename, 
	so.type, so.flags, so.size, so.time_created, so.time_modified 
FROM
	storeobject so 
	INNER JOIN conversation c ON so.guid=c.guid
WHERE
	(c.sources = 2);

It's beginning to get tougher here. We have to figure out which column in which table is being requested, and then how to join in that table to get that data so that it can test it. But wait, it can get even tougher still - what if we want extra data columns output from the query, as well as extra constraints? What about custom properties on documents? Check this out:

& & = { nmap.conversation.subject 5 "hello" = nmap.conversation.sources 2 | l to 6 l from 6"

This is "any conversation related to document GUID 0x6 whose subject starts 'hello' and is in my Inbox" (approximately). And let's also output the "nmap.conversation.subject" for each result, plus my "alex.custom" property which is something only my client uses. The SQL starts getting hairy:

SELECT 
	so.guid, so.collection_guid, so.imap_uid, so.filename, 
	so.type, so.flags, so.size, so.time_created, so.time_modified, 
	prop_0.value, c.subject 
FROM
	storeobject so 
	INNER JOIN conversation c ON so.guid=c.guid 
	INNER JOIN links link_0 ON so.guid=link_0.doc_guid 
	INNER JOIN links link_1 ON so.guid=link_1.related_guid 
	LEFT JOIN properties prop_0 ON so.guid=prop_0.guid
WHERE
	(
	((substr(c.subject,1,5) = "hello") AND (c.sources = 2))
	AND 
	((link_0.related_guid = 6) OR (link_1.doc_guid = 6))
	) 
	AND prop_0.name="alex.custom";

I didn't bother highlighting the changes from our pizza dough query; frankly, there's more different than there is the same. Things I would point out in the above:

This is actually considerably more powerful than Bongo had previously, for many reasons, and when we build back in the search it will be even more powerful.

posted at: 16:21 | path: /bongo/hacking | permanent link to this entry

Tue, 06 May 2008

Parsing queries

I recently wrote to bongo-devel about a problem in our store protocol: in the trunk version, some commands have a "query" parameter which is unspecified in our documentation. The reason it's unspecified is that it turns out that the parameter just gets passed through to CLucene, which as well as being something of a security issue also makes it more difficult to tell people how to use it.

In the experimental Store (coming soon to a server near you!) the CLucene index has been removed, and the query interface to it therefore doesn't exist any more. It's quite possible that CLucene will make a come-back, but the query interface is something I want to get rid of: for no better reason than it requires us to duplicate information from the SQL database to the Lucene index.

Because we need a parser for these queries which is simple, safe and isn't going to do nasty things with our memory allocator, I proposed a version of prefix notation / Polish notation, and have written some code which parses this into a tree structure stored in an array. The code is pretty straightforward, so I've documented the algorithm here - I'm sure it must be a known algorithm, but I've been unable to find any examples of anything similar.

In general, prefix notation isn't used as commonly as postfix notation. This is mainly because it's more efficient in terms of stack space to evaluate an expression in postfix notation, because the constants come before the operators and you can start calculating earlier. However, we're not evaluating an expression - we're going to turn it into SQL - so that isn't a benefit, and having the operators come first means we can find places to later store the constants simply.

The next step is to transform this parsed expression into something we can throw at SQLite. This is actually much easier said than done: for example, we have to be able to handle arbitrary properties on documents, which are stored in a separate table with a one-to-many relationship between the document and its properties.

The problem with that is that it's not that natural to say something like "give me rows from table A where there are rows in table B which relate to it and some of those rows contain X and Y". So, this may turn out to be something quite hard or even impossible to provide a general solution to if the logic is quite deeply nested - but ideally, I want all of that nastiness hidden from the client.

Some readers may be asking how the current Store does this. Effectively, it runs the query against the index with the properties it has at hand, and retrieves a list of Store guids of the documents which have those properties set. It then queries the SQL database for each individual guid to see if any of the properties there match the query also, and if so it outputs the match.

posted at: 16:12 | path: /bongo/hacking | permanent link to this entry

Fri, 25 Apr 2008

SQLite experiment; r800 reached....

First things, we just went past r800 yesterday - a commit I bagged myself, woo! Sadly no pretty graphs to show for it, but I think we're back to a decent pace of commits again.

The work on the experimental branch of the Store is a good way through. This branch of Bongo builds reasonably cleanly, and works somewhat - you can deliver mail into it, and get mail out of it via IMAP. It's at the point now where it needs some decent testing while the final pieces are put together.

Somewhat incredibly, the code size is on the way down again - it's already about 1000 lines smaller, and might be double that by the time it's done. Although this is extremely dammaging to our project valuation on Ohloh (which currently stands at some .5 million), it's beneficial for Bongo ;)

What's good is that there is a decent turn of speed already, especially when dealing with large amounts of mail, and things seem to be relatively stable for me copying large chunks around. And this is without doing any performance work at all: I know for sure there is plenty of really low hanging fruit on this; for example, we don't cache SQL statements in a parsed state ready for re-use, and we don't do simultaneous access to the store which should be quite easy now. I haven't timed anything, but Thunderbird feels a lot snappier.

So if you want to do some testing, please grab the /svn/bongo/experiments/store-sqlite branch of Bongo, and run your favourite software against it - trying using it with imap or pop3, or whatever, and see what happens.

posted at: 16:57 | path: /bongo/hacking | permanent link to this entry

Tue, 15 Apr 2008

Locking revisited

Sorry - another slightly technical post. This is yet again another visit to the subject of locking, which I briefly described the problems with a while ago, and then wrote some new code to help alleviate the problems.

One thing I didn't explain at all, looking back at those posts, is the different reasons for locking. We can simplify this down to two major reasons:

  1. Internal database consistency. SQL databases are usually "ACID compliant", and SQLite is no different. This means that every reader and writer gets a consistent view of the database; for example, if you're writing data to the database, you can't ever run a query which gets back partially-written data - you either see the data in its complete state, or you see nothing at all.
  2. External store consistency. A bit like internal views, we don't want clients accessing the store to see "inconsistent" data - we don't want one client to be writing a mail to a user's Inbox, and have another client read the half-written data and think it has the whole mail.

Those problems seem very similar, and indeed, our previous issues are mostly to do with the fact that those problems were conflated: the SQLite locking was used to ensure external consistency in the store. That's actually a really common - and often desirable - way of doing things. The way it works is you wrap all your SQL operations in an SQL transaction, and whether or not your operation (e.g., writing a file) succeeds stands or falls on whether or not the transaction succeeds. The database guarantees that when you use a transaction, you can make many alterations at once but no other database client gets an intermediate view of the partially-written data.

However, at this point, we run into an issue with SQLite. "Proper" SQL servers give you the ability to lock single tables, or even rows within tables, to help ensure that transactions succeed where-ever possible. SQLite doesn't give us that, it basically ensures consistency by locking the whole database. This makes the external consistency above suddenly very expensive to implement at the SQL level.

Now, surprisingly, there is also another set of locks within the store. At the moment, for example, collections are locked at a higher level when you read/write to them, because we also access other data - document contents on disk, the search index, etc. These locks are needed for the external consistency also, and almost make the SQL locks redundant.

So, my work on the store now is mainly to decouple the SQL-level locking from the provision of external consistency. We'll effectively only use the locks at that level specifically when we're reading/writing from the database to ensure that the data is written consistently when it matters (for example, allocating IDs) but not when it doesn't (for example, creating a document and then later adding the correct metadata to it). External consistency will then be provided by the higher-level locks which we already use.

posted at: 11:00 | path: /bongo/hacking | permanent link to this entry

Sun, 13 Apr 2008

Diagrams and stuff

I do love a good diagram, and I noticed that I have a number of Bongo-related diagrams stuck on my hard-drive. Most of them have seen action in a past blog post of mine, but my blog isn't really a Bongo development resource, and the ones which haven't yet been seen really ought to be a bit more widely available.

In general, though, programmers aren't great at drawing things and worse, updating diagrams in drawing programs is often tedious and time-consuming. So, I've committed some graphviz support into our build system which runs when you build the development documentation.

The first - and only - fruit of this endeavour is a UML-like diagram I've done to try to figure out what the Store's SQLite database structure should be. Behold (clicky for bigger):

I should at this point take a little bit of time to explain why I did this. I've decided that the SQLite usage in the store needs to be pretty much refactored, for a number of reasons. Amongst those is the fact that we store a binary blob within the tables which represents much of this data: some, like "mime_lines" in mail documents is completely hidden with this structure, other, "time_sent" appears in both this blob and as a column in a table.

So, job number one has basically been getting rid of this evil binary blob. This schema, then, isn't me really redesigning it per se - I'm simply pulling the structure out of this blob and putting it in tables, where it belongs.

I've also slightly denormalised the table a little. In our current database, event and mail attributes exist for all documents: and if the document isn't an event or mail, those fields are just empty. That's a bit of a waste, hence the one-to-one relations between "StoreObject" (which represents any object in a store - a mail item, a collection, etc.) and "MailDocument" et al.

The one big change I have made is the addition on the many-to-many "links" table. The purpose of this is simply to group together store objects. In our current store, we have a concept of "calendars" and "events", and events are linked into calendars rather than calendars being collections which hold events. Rather than having something specific to calendars, I've generalised this out. I'm not totally sure I like this as a way of representing calendars - I presume that it was done this way because collections can't be "typed" like a calendar document can - but it would also allow us to do stuff like linking conversations with events, so you could associate a meeting with the invitation conversation or something.

posted at: 12:48 | path: /bongo/hacking | permanent link to this entry

Sat, 08 Mar 2008

More work in the Store

Since my last techie post on locks was appreciated and since I'm still crook, here comes another one. Again, the focus on making Thunderbird work better with Bongo was the main aim - after my locking changes, I found that we ran into a whole new set of problems: mostly, that the IMAP agent wasn't responding in a timely manner.

So, my next step was to address the progress reporting with a small new piece of code which sits in the various command loops inside the IMAP agent. Basically, as we spin around doing work, it checks to see when it last spoke to the IMAP agent, and if it was too long ago it responds again. I also found out that there was some code already in place which did similar, but it wasn't working because the socket wasn't being flushed: in non-techie terms, the responses were waiting on the server because the OS hadn't sent the data to the client. So, in those cases, the data is now sent, and things are happier.

The Tbird problems aren't totally done with yet - I do occasionally get errors still which I'm trying to track down - but it is much better. Certainly, copying large amounts of mail around is much better, and setting read/unread status works much more quickly. It's getting there as being actually usable long-term: I now have some tens of thousands of mails in my system, and it works, which is good news.

Going further with the "actually usable" thing, one thing I wanted to get some better information on was the performance of the Store in various places. From having read the code, I knew there were some places in the code which were obviously poorly performing: the algorithms in use are very naive, and there are much better ways of doing some things. But in order to know where to look, I decided I needed to instrument the Store a little bit. I've written a patch which I'll send along to -devel, but let me show you a little bit of output:

Command COPY took 0.726920 seconds.
 - Out of band messages (0.000033)
 - Starting document copy (0.000295)
 - Files linked (0.033004)
 - Processed doc (0.024436)
 - Parse contents (0.602086)
 - Index contents (0.049533)
 - Commit transaction (0.016867)
 - Finished processing (0.000047)

What happens is that every time a command is run, and during the command run, it pockets away little bits of data about the time performance so far. With enough bits of logging inside you get a reasonably granular idea of what's going on.

This example is of a particularly poor-performing COPY command. You may think you don't copy mail very often, but actually a lot of operations turn into COPY: deleting a mail is copying it to the Trash, filing a mail is copying it to the new folder, etc. (IMAP doesn't know about moving mail). This command took 0.7 seconds: most of the time it's quicker than that, but that's close to a mail per second. That's horrible performance.

If you'd have asked me before where all the time was going, I would have pointed the finger directly at CLucene. However, looking at the timings above, we can see it's not the main problem: far and away, accounting for 83% of the runtime, is actually the "Parse Contents" stage. The code at that point looks like this:

    TCLog(client, "Processed doc");
    errmsg = StoreProcessDocument(client, &info, collection, dstpath, 0);
    if (errmsg) {
        ccode = ConnWriteStr(client->conn, errmsg);
        goto finish;
    }
    TCLog(client, "Parse contents");

(The log comes after the code it's measuring for technical reasons)

So, we can see that single StoreProcessDocument() call is actually the big deal. And what is that call doing? It's actually just filling in the SQLite database (but not writing to it - that happens later, and doesn't account for the speed). Re-generating data we already have is accounting for 4/5ths of the run-time cost. Ouch.

No, I don't have a patch which fixes COPY yet. But I'm thinking about it :)

posted at: 18:43 | path: /bongo/hacking | permanent link to this entry

Thu, 06 Mar 2008

Concentrate, here comes the science bit.

First post after being "away" - my server was down for a little while too :/ No matter, I wasn't really in a position to blog anyway. Real life has interrupted for a bit, although I've managed a bit of hacking to distract myself.

So, we've had various troubles with IMAP. Somewhat hilariously, although I wrote a message to -devel about this, I did it from my trunk Bongo, which doesn't actually send e-mail since the smtpc code was committed (which I think is a configuration issue over here rather than a bug of Pat's). So, d'oh to that. I'll blog about this instead, then.

The problem

In one word - "Thunderbird". Tbird is a pretty smart IMAP client in many ways, and one of the smart things it does is use multiple IMAP connections to do many things at once: for example, doing things like mail filtering on your Inbox in the background while you're reading mails in another folder. The issue with doing many things at once is that you then enter the world of concurrency, which is relatively Hard.

To prevent concurrent tasks from corrupting each other, what I call "locks" are put in place. You can think of these a bit like signals in a railway shunting yard: you don't want to risk collision between trains, so you put signals on the track in such a way that trains obeying the signals will never meet. We have locks around access to store collections, access to the database, access to the text index, and access to internal data structures: all these locks are used to ensure access works consistently.

Following this metaphor, locks are great, but there are problems you want to avoid - one of which is "starvation". If a train waits for the green signal but only gets it rarely, it's effectively "starved of track" - similarly, a process waiting on a lock and not getting in there is starved of a resource - usually, some piece of data.

Starvation was effectively happening in our IMAP. When the IMAP agent is starved of access to a store object for too long, it returns an error to the IMAP client, and the IMAP client assumes that everything has failed and starts again. Sadly, with Tbird, starvation wasn't a rare occurance - it was effectively guaranteed. If Tbird was doing something like moving mail in one IMAP connection, that IMAP connection would work the Store quite hard and would effectively prevent the other IMAP connection from getting access unless by fluke of timing it was able to grab a lock in the short time the other process didn't hold it.

The solution

So, if we care about making sure that starvation doesn't occur, you need some system other than just basic signalling. You want a system to ensure some kind of basic fairness. Now, in computer science, you can write about "fairness" until the cows come home, but in this case it's probably good enough that a queue be put in place - that ensures that everyone gets a turn at some point.

It turned out that our existing locking system wasn't able to really do this. In fact, it was pretty complex, for another good reason: it has both "shared" and "exclusive" locks. The semaphore system works like signals in the train yard: only one process is allowed at a time. Let's say the code was written to a "look, but don't touch" rule: if the code isn't altering anything, you can have many of them looking at the same time without fear of corruption, which is good for performance. However, you can't do that with basic semaphores alone, you need to build that on top. In fact, you can look at the previous code and see how complex it is - a lot of that is machinery to allow shared locks, and mediate access when someone comes along and does want to touch - you have to get rid of all the look-ers first.

What I've written is along the lines of the original. There is a concept of shared and exclusive locks, and it attempts to mediate between those two. However, when the lock in unavailable, where the previous version waited a while and then gave up, this version puts the process in a queue and waits until access is possible. You might wonder why it needed a ground-up new version of locking - well, you don't need to be a coder to compare the original code to the new hotness, and you'll see mine is half the size, better commented and (in theory) more featureful - I tried bolting it on top of the old, it was buggy and awful.

The science bit

How does it work? Well, it's relatively simple. Like the old system, there is a pool of locks, each of which is associated with a particular store resource that we want access to. Each lock is made up of two integers (readers, writer) and a queue. 'readers' is the number of processes who hold a shared lock, 'writer' is true if a process holds an exclusive lock. If 'writer' is false, you acquire and release a shared lock just by incrementing and decrementing 'readers'. If 'readers' is zero, 'writer' can be set to true to acquire a write lock.

What if those assumptions ('writer' is true or 'readers' is greater than zero) don't hold? That means someone else has the lock we want, and we're going to have to wait: we turn to the queue. We place a ticket in the queue, and wait for that ticket to be called. At the point the ticket is called, the lock is free and we can grab it for whatever purpose.

When does your ticket get called? Well, if you're a reader, you get called when either the writer holding the lock lets it go, or the reader in front of you in the queue got its shared lock. If you're a writer, it happens when all the readers have let go of their shared locks. Obviously, for the latter to happen, you have to ensure that readers don't skip in front of waiting writers: in fact, this turns out to be quite easy to code, because the logic is simple: when you come to acquire the lock (whether reader or writer), you look at the queue and if it's not empty, you join the back of it.

That's the idea of the code, anyway, and hopefully if you read through the code you'll find it follows that logic. It's already partly hooked into the store as of r680, and at least for me, I don't seem to get IMAP errors any more, which is great. I'm awaiting feedback from people like Pat who have huge mailboxes, though ;)

Like any locking system, it may well have serious bugs which cause the whole house of cards to fall down. Hopefully these will shake out pretty quickly: I'm using this revision on my system, so if there is a problem it should be obvious. Also, the interaction with the Store WATCH system is potentially dodgy at the moment. If this code works, there is a good amount of legacy code which can come away and again simplify things massively, which will be great - that was just a patch too far at this stage, though :)

posted at: 15:41 | path: /bongo/hacking | permanent link to this entry

Wed, 14 Nov 2007

On locking problems

Longish time no blog: not really getting enough hacking time at the moment :(

Recently we discovered a problem for IMAP users. IMAP clients can have multiple connections open at any one time, and some clients do that in practice to speed up certain operations: for example, Thunderbird's filtering system runs more than one connection to move multiple messages at once. Which is great.

Our problem was twofold: primarily our indexing takes a little while, but the main issue is that while the indexing was happening, the mailbox was essentially frozen to other changes for various technical reasons. Which isn't great.

It might be possible to lock and unlock mailboxes at clever points and do things in a certain order which would make everything ok, but that's complex and error prone. So, the approach I'm looking at is converting our store from mbox format files to maildir specification file layout. This relieves us of a couple of troubles:

All in all, it's looking like this new way is going to be a big win, and so far I'm removing more code than I'm adding (209 additions versus 234 removals), which is pretty good considering the featureset isn't really changing.

posted at: 23:01 | path: /bongo/hacking | permanent link to this entry

Wed, 25 Apr 2007

Hawkeye Status

Last weekend I started looking at Hawkeye, prompted mostly by the need for people to be able to change the config for Bongo while we're putting the new config system in place.

After a bit of hacking last night, it's finally at the stage where it does something useful! You can run it under bongo-standalone (previously this wasn't possible), you can log in as a user, and you can enable or disable agents from startup.

Ok, it's not incredible functionality, but it's a start :) One thing which is slightly interesting is that we rely completely on the store for authentication now: so, I can log into the admin tool as any user. However, without permissions to read the various files (permissions which are enforced by the store itself), you can't change anything - you can't even see anything!

I'm hoping to have this committed by the end of the week. That will then open up the following tasks for work:

posted at: 09:24 | path: /bongo/hacking | permanent link to this entry

Tue, 24 Apr 2007

Setting up Access Control Lists

A little while ago, we decided that we would put some system configuration into the store, and rely more or less on access controls to govern who could read/write that data, which is quite a nice flexible system. However, although I knew the ACL system existed, I didn't actually know anything about it, or even if it worked.

Like most things store related, the code is reasonably clear, and when you try it you find it basically works. I rarely leave the store code in Bongo feeling unimpressed, which I suppose is a good feeling!

So, I documented what I learned about the ACL system on the wiki, and that resulted in this patch for bongo-config (mostly). Before, without ACLs, even logging in as admin meant you couldn't read or write the Bongo configuration in the store - it just refused you access. Contrast with now:

= $ telnet localhost 689
  Trying 127.0.0.1...
  Connected to localhost (127.0.0.1).
  Escape character is '^]'.
  4242 NMAP <b7237b90laptop.alexhudson.com462e5b4d>
= AUTH USER admin ****
  1000 127.0.0.1
= STORE _system
  1000 OK
= COLLECTIONS
  4240 Permission denied
= READ /config/manager
  2001 nmap.document 654
...
= QUIT

Everything marked '= ' is something I typed in. So, you can see, we switch to the _system store, and try to look at the contents with a collections command - this fails, we don't have permission to do that. But we can read the bongo-manager configuration file.

Next stop, a basic but working Hawkeye :)

posted at: 20:44 | path: /bongo/hacking | permanent link to this entry