The State of Content Management Systems

Over the past four years, or so, I've reviewed various free software content management systems for different needs: to power my personal homepage or the website of friends, to provide e-commerce or similar facilities for business customers, or to run simple intranet systems. I've been consistently disappointed and frustrated at the quality of this software, and this essay delves a little deeper into the reasons.

This is unfinished: while this essay is a fairly complete read, it's not yet the finished product. I will be updating it soon.

Introduction

I like to think that my needs on the web are fairly simplistic. Most of the time I'm not doing much more than presenting text content - be it essays, or blog entries - and what I'm most interested in is making that system as simple as possible. But, I do have other needs and interests too. For example, I consider this list as being a reasonably complete list of many of the things I want and need:

  1. must be secure [this is a requirement];
  2. respond to users with reasonable speed;
  3. create clean output that implements web standards;
  4. use an intelligent and simple URI scheme for permanent links;
  5. be able to support a blog;
  6. work on a variety of hosting platforms;

Now, from that list of needs, there tend to be some technical implications. The following is more-or-less true:

But, that's really about it. I don't think there are many other technical implications (you can argue clean URIs require mod_rewrite or similar, and speed is somewhat in the eye of the beholder). However, if you survey the CMS market (I can recommend opensourcecms.com as a place to go to compare various CMS systems), you'll see they all have a number of extra requirements which can conflict with the above.

Requiring the SQL database

I'm going to list this first, because for me, this is the biggest issue. I find the requirement for an SQL database just to run a simple site bizarre, to say the least. An SQL database isn't that great a fit for the requirement: aside from full-text search, there really are few benefits that storing pages in a database brings. On the downside, it means you need to have a database in the first place, and that you have to fetch content from it, which isn't terribly quick.

Servers already have a great way of storing files: the file system. It's quick, and ideally suited for the task. Ironically, many of the systems which use an SQL server to store data end up implementing a caching system to increase performance - so the data is ending up on the file system anyway!

Aside from the fact that I don't believe caching is an answer to adequate baseline performance, putting in a database layer gives you other problems: you need more code, you need to abstract out the differences between different SQL servers for compatibility, and you need to guard against SQL injection security issues. Then you have the issues of managing the SQL database, ensuring backups can be taken, etc. These trade-offs are different for each CMS, but in general I don't think the benefits outweigh the disadvantages.

Should SQL servers be used by CMSes? Of course they should - SQL servers can play a really useful role, especially for highly structured data. Should everything be in the database? I don't see any compelling reason to do that, and I see plenty of reasons not to do it.

Requiring PHP 5

Sure, PHP 5 has some niceties that should have been in PHP 4 from the start. If you're writing a CMS, you probably want at least some level of object-orientation, and PHP 4 is glaringly lacking in some areas (e.g., no static class variables). But, on the other hand, PHP development is very tricky for ISPs to keep up with.

Few ISPs I know are running PHP 5 right now. I'm sure over time plenty of them will, but by that point the people writing these CMSes will have moved onto PHP 6, which brings with it a variety of new PHP-incompatibilities.

This would be less of an issue if the PHP developers could keep their damned hands off the language for a little while, but that seems unlikely. In the meantime, I need to support a variety of PHP versions.

It's a framework!

Seeing this written on the homepage of a CMS makes me immediately click away. You'll also see "modular system with rich API for plugins", or similar claims. In general, you can read this as "we've added a whole load of extra cruft as a middleware/abstraction layer". Serving up web pages should be relatively simple fayre; something which claims to be a complete web application development framework is prima fascia over-engineered, and probably not designed for the needs of real users.

I have no problem with people providing a framework of sorts. Writing library code is generally a good thing for security and maintainability. But, this is a means to an end, not an end in and of itself.

Poor URI schemes

A number of CMSes think it would be a jolly jape to assign each page a number (say.. the primary key from the database? yay!), and end up with a URI scheme along the lines of http://www.example.com/index.php/173726 or similar for each page.

They then go and "fix" it by assigning normal URIs to pages, so you have a couple of URIs accessing the same page, and then expect you to use mod_rewrite or something to do this name-to-number mapping.

This is pointless madness.

A fine-grained permissions scheme

Having users and logins and stuff is pretty cool. Being able to group up users for collective administration is also pretty cool. What is uncool is breaking down every single little operation and assigning it a permission, eg. "permission to modify stylesheets". This is crazy nonsense. I've seen CMS systems implement a set of mandatory access controls more locked-down than what the military require.

Of course, all those checkboxes on the user's profile impress visitors, so the fact that no-one actually uses all these things is of no interest.

Everything through my index.php!

Web servers are very good at some things. Sending verbatim, unaltered files is one of those things. In fact, some operating systems even have a special call which says "take this file off disk, and spew it down this network socket" - the web server just tells the OS which file to send where, and lets it get on with it.

Of course, many systems want "more control" over output than this. So, every single file request goes though the main index.php - images, stylesheets, everything - so that access controls, etc., can be applied. The net result of this is that the CMS, which probably isn't terribly quick in the first place, is suddenly being hit ten or twenty times for each page view, and it doesn't take many users to bring it to its knees.

CMSes shouldn't do this: stay out of the way of the web server is a strategy for simpler code and better performance.

Going back to first principles

It's pretty clear to me that many of these systems are over-engineered. I've used a couple of them, on commercial projects, and regretted it eventually. Problems have ranged from getting the thing to install/work on a particular hosting package, to simply not getting sufficient performance out of the thing. One wiki-style CMS we tried was taking up to 10 seconds per page hit.

In particular, performance seems to be a real issue. Page hits tend to create high load for small periods of time: a server might be spiking quite high loads, but if you don't measure it fine enough, it averages out to a fairly low load. That's not a problem until your server is under any sort of load, when the processes begin to back up in the run queue and suddenly your website runs like molasses.

There is still no one system which stands out as being "the way": each one comes not just with problems, but design flaws that will cause you to re-assess your needs and find the best compromise. This is a real screaming shame.

Having read this, you may also want to read in more detail about the experiences which lead me to write a system called Tocca.