So, the efforts of the past few weeks have started to bear fruit. Part of this has been checked into SVN, some will be checked in later today, but we’re definitely beginning to get there.

I thought it would be interesting to look at the problem from the point of view of what the code is doing now. If you recall, the basic problem is this: we want to run an SQL query against our data to return a list of documents, but we then want to be able to refine the data on a number of different criteria.

The basic SQL query looks like this:

SELECT 
	so.guid, so.collection_guid, so.imap_uid, so.filename, 
	so.type, so.flags, so.size, so.time_created, so.time_modified 
FROM
	storeobject so;

This is one of those times where I wish my blog did syntax highlighting. But, anyway: let’s think about this as a non-programmer. The best analogy is that the above query is pizza dough. What happens is that the store client is telling us what toppings it wants, and the store is having to figure out the right order of doing things – you don’t want the tomato sauce on top of everything else, the pizza wouldn’t work.

So, what kinds of toppings do our clients want? Let’s look at a few examples. First up, an easy one – looking for mail documents, which have a ‘type’ of ‘2’. The client will tell the store this requirement via the Bongo Query Language, which looks like this:

= nmap.type 2

Simple enough. It’s just infix notation for “nmap.type = 2”. This changes the SQL query to:

SELECT 
	so.guid, so.collection_guid, so.imap_uid, so.filename, 
	so.type, so.flags, so.size, so.time_created, so.time_modified 
FROM 
	storeobject so 
<i>WHERE 
	(so.type = 2)</i>;

I’ve italicised the new part of the query. It’s only a little extra WHERE clause, although it’s had to figure out that “nmap.type” is actually the “so.type” column in our query. What if the client tests a piece of data which isn’t in the standard query?

= nmap.conversation.sources 2

… which turns into this SQL:

SELECT
	so.guid, so.collection_guid, so.imap_uid, so.filename, 
	so.type, so.flags, so.size, so.time_created, so.time_modified 
FROM
	storeobject so 
	<i>INNER JOIN conversation c ON so.guid=c.guid
WHERE
	(c.sources = 2)</i>;

It’s beginning to get tougher here. We have to figure out which column in which table is being requested, and then how to join in that table to get that data so that it can test it. But wait, it can get even tougher still – what if we want extra data columns output from the query, as well as extra constraints? What about custom properties on documents? Check this out:

& & = { nmap.conversation.subject 5 "hello" = nmap.conversation.sources 2 | l to 6 l from 6"

This is “any conversation related to document GUID 0x6 whose subject starts ‘hello’ and is in my Inbox” (approximately). And let’s also output the “nmap.conversation.subject” for each result, plus my “alex.custom” property which is something only my client uses. The SQL starts getting hairy:

SELECT 
	so.guid, so.collection_guid, so.imap_uid, so.filename, 
	so.type, so.flags, so.size, so.time_created, so.time_modified, 
	prop_0.value, c.subject 
FROM
	storeobject so 
	INNER JOIN conversation c ON so.guid=c.guid 
	INNER JOIN links link_0 ON so.guid=link_0.doc_guid 
	INNER JOIN links link_1 ON so.guid=link_1.related_guid 
	LEFT JOIN properties prop_0 ON so.guid=prop_0.guid
WHERE
	(
	((substr(c.subject,1,5) = "hello") AND (c.sources = 2))
	AND 
	((link_0.related_guid = 6) OR (link_1.doc_guid = 6))
	) 
	AND prop_0.name="alex.custom";

I didn’t bother highlighting the changes from our pizza dough query; frankly, there’s more different than there is the same. Things I would point out in the above:

  • it gets the columns in the ‘links’ table correct, even though the relationship is going in both directions in the query;
  • the extra columns are coming out in the SELECT part of the query, but for custom properties we also need to specify a LEFT JOIN in the FROM and a constraint in the WHERE to get the right property where necessary.

This is actually considerably more powerful than Bongo had previously, for many reasons, and when we build back in the search it will be even more powerful.