Since my last techie post on locks was appreciated and since I’m still crook, here comes another one. Again, the focus on making Thunderbird work better with Bongo was the main aim – after my locking changes, I found that we ran into a whole new set of problems: mostly, that the IMAP agent wasn’t responding in a timely manner.

So, my next step was to address the progress reporting with a small new piece of code which sits in the various command loops inside the IMAP agent. Basically, as we spin around doing work, it checks to see when it last spoke to the IMAP agent, and if it was too long ago it responds again. I also found out that there was some code already in place which did similar, but it wasn’t working because the socket wasn’t being flushed: in non-techie terms, the responses were waiting on the server because the OS hadn’t sent the data to the client. So, in those cases, the data is now sent, and things are happier.

The Tbird problems aren’t totally done with yet – I do occasionally get errors still which I’m trying to track down – but it is much better. Certainly, copying large amounts of mail around is much better, and setting read/unread status works much more quickly. It’s getting there as being actually usable long-term: I now have some tens of thousands of mails in my system, and it works, which is good news.

Going further with the “actually usable” thing, one thing I wanted to get some better information on was the performance of the Store in various places. From having read the code, I knew there were some places in the code which were obviously poorly performing: the algorithms in use are very naive, and there are much better ways of doing some things. But in order to know where to look, I decided I needed to instrument the Store a little bit. I’ve written a patch which I’ll send along to -devel, but let me show you a little bit of output:

</p> <pre>Command COPY took 0.726920 seconds. - Out of band messages (0.000033) - Starting document copy (0.000295) - Files linked (0.033004) - Processed doc (0.024436) - Parse contents (0.602086) - Index contents (0.049533) - Commit transaction (0.016867) - Finished processing (0.000047)</pre> <p>

What happens is that every time a command is run, and during the command run, it pockets away little bits of data about the time performance so far. With enough bits of logging inside you get a reasonably granular idea of what’s going on.

This example is of a particularly poor-performing COPY command. You may think you don’t copy mail very often, but actually a lot of operations turn into COPY: deleting a mail is copying it to the Trash, filing a mail is copying it to the new folder, etc. (IMAP doesn’t know about moving mail). This command took 0.7 seconds: most of the time it’s quicker than that, but that’s close to a mail per second. That’s horrible performance.

If you’d have asked me before where all the time was going, I would have pointed the finger directly at CLucene. However, looking at the timings above, we can see it’s not the main problem: far and away, accounting for 83% of the runtime, is actually the “Parse Contents” stage. The code at that point looks like this:

</p> <pre> TCLog(client, "Processed doc"); errmsg = StoreProcessDocument(client, &info, collection, dstpath, 0); if (errmsg) { ccode = ConnWriteStr(client->conn, errmsg); goto finish; } TCLog(client, "Parse contents"); </pre> <p>

(The log comes after the code it’s measuring for technical reasons)

So, we can see that single StoreProcessDocument() call is actually the big deal. And what is that call doing? It’s actually just filling in the SQLite database (but not writing to it – that happens later, and doesn’t account for the speed). Re-generating data we already have is accounting for 4/5ths of the run-time cost. Ouch.

No, I don’t have a patch which fixes COPY yet. But I’m thinking about it 🙂