Software architecture is failing

I doubt there has ever been a time when software architecture was seen as a raging success. The “three-tier architecture” of the web has held up extremely well and is an excellent place for many people to start. The “12 Factor App” approach has encouraged developers to adopt practices that make deployment and scaling much simpler. Over the last couple of years, though, I’ve noticed developers advocating for architectures I consider to be extreme and limited in utility, foisting highly complex systems into startup environments at great cost. It appears to me to be getting worse.

First, I should say I’m going to talk about a few specific patterns and technologies in this post. I’m not against any of them; I don’t consider any of them to be bad ideas. I do think that some of them are limited in their applicability, though. I don’t believe in silver bullets, and I think architecture is only “good” in the context of the problem it tries to solve – I think it’s more important to design for adaptability (I’ve called this Pokémon Architecture in the past).

Through various conversations with both different technical teams and business owners over the last year (none I’m going to name!), I keep hearing about the same types of problems again and again. “We’re not delivering quickly enough!”. “Our systems are too complex to maintain!”. “The application we delivered last year is completely legacy now but it’s too difficult to replace!”.

Talking about how technical teams make decisions, I often see a complete lack of understanding how to relate the business issues their organisation faces to the technical strategy. Regularly, I don’t see a technical strategy at all. The team may have made a decision like “We’re 100% microservices!”, but when I ask them they cannot give a good reason that relates to the business in a direct way.

Mistakes I’m seeing

To give an flavour of some specific problems I’ve seen:

CQRS. I think this is the one I’m noticing most currently, because it’s everywhere right now. I saw an article on Twitter the other day about “How to develop your MVP with CQRS”. For those who don’t know, CQRS is essentially a RPC pattern that enables you to deal with highly complex and volatile data stores – most good introductory material on it will say “This is appropriate for large/complex systems”.

A lot of dewey-eyed write-ups have proclaimed it a good solution to deal with transactions across microservices (it’s not, really), most of the push I see for it is developers who don’t like ORMs.

This latter category is particularly pernicious – the end result is a huge increase in the amount of code being written, a decrease in the DRYness of the code and a system that is much more difficult to reason about, but no obvious benefits to offset these problems.

I’ve seen a few systems now using a CQRS approach when a standard CRUD approach works fine. “Why?”, I ask. They burble about “responsibility” or “different domains”, I haven’t yet had the “we tried both approaches and this one works better because ” response.
Event Sourcing. Related to the above, I guess, as they are often used in combination. Event Sourcing says you have an immutable log of events, and use that log to create an eventually-consistent view of your application – rather than saving state in an RDBS or something.

This is a classic “we didn’t consider business requirements” type technical choice. I’ve seen two different start-ups now, who hold personal data about customers in their “immutable log”. “How are you planning to handle GDPR requirements and removal of data?” – turns out the answer is often “Er – we haven’t thought about that.” Cue a sad face when I tell them that if they don’t modify their immutable log they’re automatically out of compliance.
Local storage. A tech team decided that everything needed to be a Single-Page Application on the front-end, and needed to handle offline capability – pretty reasonable. However, a good bunch of the data they were processing was extremely sensitive, and by running everything through their SPA framework, they were effectively distributing that sensitive content across a variety of browser caches. Whoops. Difficult decision to reverse out of at a late stage.

More generally, I see little re-use of existing technology and/or a desire to use a lot of “new shiny” for very specious reasons. The numbers of developers working at an inappropriately low level is frightening.

What are we doing wrong?

I place the blame on technical leaders like myself. For those in tech who are not working at Facebook/Google/Amazon, we’re simply not talking enough about what systems at smaller enterprises look like. We’re not talking about what is successful, what works well, and what patterns others might like to copy.

A lot of technical write-ups focus on scaling, performance and large-scale systems. It’s definitely interesting to see what problems Netflix have, and how they respond to them. It’s important to understand why Google take decisions in the way they do. However, most of their problems don’t apply to anyone else, and therefore many of the solutions may or may not be appropriate.

In this view of the world, it’s easy to see why teams focus on the solutions these companies talk about. They want to get ahead of the scaling curve. They want to deliver more robust services. They don’t want legacy.

I recall one specific conversation with an engineer about one of their core services. “It’s totally legacy, and no-one maintains it – it just sits there working, except for the occasions it doesn’t. The problem is replacing it is so hard, it’s got great performance, and the business doesn’t want to spend time replacing something working”. This is the problem being ahead of the curve – the definition of “success” (it works great, it’s reliable, it’s performant, we don’t need to think about it) looks a hell of a lot like the definition of “legacy”.

A well-articulated technical vision should match the business vision directly, especially in a tech company. There are core parts of the technology that deliver most of the value / differentiator, and these are important to get right. There’s usually then a bunch of other software and services which is more like scaffolding; you have it around in order to get stuff done.

What should we do?

I think we’re often getting the build/buy decision wrong. Software development should be the tool of last resort: “we’re building this because it doesn’t exist in the form we need it”. I want to hear from more tech leaders about how they solved a problem without building the software, and tactics for avoiding development.

I think we worry too much about the future. We say “You aren’t gonna need it!” but we don’t live the YAGNI values. We talk too much about technical debt, and in the meantime complain that agile is “failing”. I want to hear more about projects that deferred decisions and put off architecting until much later in the process.

I want to hear more about delivery at real speed. Small pieces of software that are not necessarily interesting but deliver business value are the real heroes in our industry, and the developers who create them the real stars. To paraphrase Dan Ward, we can only deliver at high pace if we get started early – we means re-using existing work. “Mash-up” shouldn’t be a dirty hackfest concept.

I especially want to hear more about developers working with systems that have constraints. Too often I hear “We were using <product X|framework Y> and it had this one specific problem, so we solved it by rolling our own!”. Well, that’s great – and it might be the right choice. But it’s not the only one – it’s the proverbial sledgehammer to crack a nut.

I believe that if you don’t encounter the constraints of the layers underneath you, you’re not working at a high enough level. Developers should occasionally bump into obvious stupid stuff that the layer beneath causes.I want to hear from people pushing standard stuff beyond its limits. I think we grossly underestimate what off-the-shelf systems can do, and grossly overestimate the capabilities of the things we develop ourselves.It’s time to talk much more about real-world, practical, medium-enterprise software architecture.

Postscript: you may be interested in reading about some of the reaction this post generated.

Security by Design For Everyone