Index Theme
February 2025 8 min read ArchitectureRegulated Systems

Designing for the Hard Part

Why the interesting architecture in regulated domains lives in the places everyone else routes around.

Most architecture diagrams are a confession. They show the parts the author understood well enough to draw cleanly: the API gateway, the queue, the read replica, the cache. What they leave out is the reconciliation job that runs at 02:00 because two systems disagree about money, the manual override that a compliance officer needs and can never be told they cannot have, the field that is nullable in the database and absolutely not nullable in the law. The hard part is rarely the part that makes the diagram look impressive. It is the part that makes the diagram embarrassing.

I have spent most of my career being brought in for that part. Rent rewards in the UK, casework and billing for a legal SaaS, a disaster-resilience platform in the Philippines, public-transport data for the British government. Different surfaces, same shape underneath: the domain is messy, the rules are externally imposed and occasionally contradictory, and the cost of being confidently wrong is not a bad sprint, it is a regulator's letter or a tenant who loses a reward they earned. You do not get to refactor your way out of that with a nicer framework.

In a regulated domain, the schema is the politics. Decide what counts as a valid record and you have decided who gets paid, who gets flagged, and who gets to argue.

The schema is the politics

The first thing I do on a regulated platform is fight about the data model, and I have learned to do that fight early and out loud. Because in these domains the schema is not a technical artifact, it is a settlement between stakeholders who have not yet realised they disagree. What counts as a 'completed payment'? Is it the moment the open-banking event lands, the moment it clears, or the moment your ledger says it cleared? Three reasonable people will give you three answers, and each answer quietly redistributes risk to a different team.

The temptation is to defer this. Ship something flexible, a JSON blob, a status field with a dozen string values, and let the meaning settle later. It never settles later. It calcifies. Six months on, there are four services reading that status field, each interpreting 'pending' slightly differently, and now the ambiguity is load-bearing. Nobody can change it without breaking someone, so nobody changes it, and you have shipped your confusion into production permanently.

So I treat the model as the contract and I make it boring on purpose. Explicit states with explicit transitions. A separate, append-only record of what actually happened, distinct from the current best guess of the truth. When a regulator or a furious customer asks 'why does it say this', you want to answer with a history, not a reconstruction. The flexibility everyone asks for up front is almost always a request to avoid a conversation. Have the conversation.

Build for the dispute, not the demo

Normal product software optimises for the happy path because the happy path is the product. Regulated software has to optimise for the dispute, because the dispute is where the system either holds or humiliates you. Every meaningful action needs to be reconstructable after the fact, by someone who was not there, possibly years later, possibly in front of someone with subpoena power.

In practice that means a few unglamorous commitments. Decisions get recorded with their inputs, not just their outputs, you want to know not only that something was rejected but what it was rejected against. State changes are events, not in-place mutations, because 'it used to be true and now it isn't' is a sentence you will need to say with evidence. And you draw a hard line between the system's opinion and the human's override, then you log the override as a first-class act with a name attached. The worst audit trail is the one that makes a person's deliberate, justified decision look like a bug.

This is also where I push back hardest on automation enthusiasm. In a messy regulated domain, the goal is not to remove the human, it is to make the human's job legible and defensible. The system should narrow the decision, surface the relevant facts, and then get out of the way and record what the human did. A platform that quietly decides on people's behalf and cannot explain itself is not advanced. It is a liability with good test coverage.

Earned trust beats clever code

The thing nobody tells you about senior work in these domains is how much of it is not code. It is convincing a compliance lead that the new flow is safer than the spreadsheet she trusts, because her spreadsheet, for all its faults, has never silently lost a row. It is sitting with a public-sector delivery team and accepting that 'we cannot break the existing consumers' is not legacy timidity, it is the actual requirement. The integrations you cannot see are the ones that matter most.

So I have stopped measuring my own work by elegance and started measuring it by how well it survives contact with reality, and with people who do not trust it yet. Does it degrade honestly when an upstream feed is late? Can a non-engineer understand why it did what it did? When it is wrong, and it will be wrong, does it fail in a way you can explain and correct, or in a way you have to apologise for? That is the hard part. It was always the hard part. Bring me into that one.