The current advice on data contracts is tidy. Write the schema down, put it in version control, fail the build when someone breaks it. Shift left, enforce in CI, sleep better. I have built the tooling that does exactly this, and I now think the schema is the least interesting part of the problem. A data contract is not a file. It is a negotiated settlement between two teams who have not yet admitted they disagree, and the schema is just the paperwork the settlement is written on.
I came to this the slow way. I maintain a handful of open-source data-contract tools — SEIP, metricspec, dataproduct-kit — and the embarrassing thing about building them was discovering how little the spec format mattered. YAML over JSON, which constraints to express, whether semantic types deserve first-class syntax: all of it was an afternoon's argument with myself, and all of it was beside the point. The hard design question was never how do we describe the data. It was who is allowed to change this, and who has to be told when it changes. That is not a serialisation problem. It is an org chart.
A contract without a named owner is just documentation; an over-specified one is just a liability with better tooling.
The schema is the easy 20 percent
Write down the shape of a dataset — columns, types, nullability, a few range checks — and you have done the part a junior engineer can finish in an hour and a linter can enforce forever. It feels like progress because it produces an artifact, and artifacts are reassuring. But a schema that passes CI can still be a lie. A column typed as a non-null integer tells you nothing about whether the producer means zero as "none" or as "not yet measured". Those are different facts. They will land in the same downstream sum and quietly corrupt it.
This is the lesson I learned holding NaPTAN, the UK's national public-transport access-node dataset — 435,029 records describing every place you can board a bus, train, tram or ferry in Great Britain. NaPTAN is the original contract you cannot renegotiate. It is public, it is free, and its consumers are uncountable: journey planners, timetabling systems, apps written years ago by people who have long since moved on. No schema field captures the real obligation. The obligation is that a stop marked active had better physically exist, and a decommissioned one had better be marked inactive before it poisons routing in a system I will never see. The schema is the floor. The contract is everything the schema cannot say and someone still has to guarantee.
So the 80 percent that matters is semantic and social. What does this field mean. Under what conditions is a row valid in the world, not just in the type system. Who decided that, and does the team consuming it know the decision was made? Treat the contract as a schema and you have automated the cheap part while leaving the expensive part exactly where it was — in people's heads, in tribal memory, in the gap between two teams.
A contract without an owner is just documentation
Here is the failure I have watched most often. A platform team, wanting to do the right thing, generates contracts for every dataset, checks them into a shared repository, wires up validation, and declares the migration done. Six months later half the contracts describe data that has drifted. Nobody updates them because nobody owns them. Engineers have learned to ignore the validation warnings the way you learn to ignore a smoke alarm that goes off every time someone makes toast. The contracts became documentation. Documentation rots, and rotted documentation is worse than none, because it is confidently wrong.
The thing that was missing was never tooling. It was a name. A data contract has to bind two specific roles: a producer who promises to hold a guarantee, and a consumer who is owed it. When I designed the ownership model in dataproduct-kit, the spec format took an afternoon and the producer/consumer boundary took the rest of the project — because that boundary is where the real engineering lives. A guarantee with no one accountable for it is not a guarantee. It is a hope with YAML syntax.
What a named owner buys you is not blame. It is a forcing function, and it works in three ways that the file alone never can.
It forces the disagreement early. Two teams almost always disagree about something — what "active" means, how late a daily feed can arrive, whether nulls belong in a field one side treats as required. Most of the time neither side knows they disagree until production tells them, expensively. A contract with owners drags that disagreement into a code review instead of an incident. It also makes a change cost the right person something: if the producer can alter a field's meaning without the consumer's owner signing off, there was never a contract — there was a schema the producer happened to publish. The signature is the contract. The schema is the attachment. And it survives turnover. Tribal knowledge leaves when people leave; a named owning team, with the obligation written down, is the only version of institutional memory that outlives the individuals who built it.
Strip the owner out and every one of those collapses back into documentation that describes what the data used to be.
Where the contract becomes a liability
The opposite mistake is just as expensive, and senior engineers fall into it more often, because it looks like rigour. You can over-specify a contract until it becomes a cage nobody can change without breaking. At that point teams route around it. They fork the dataset, add an undocumented sidecar field, or simply stop telling you what they actually depend on. An over-constrained contract does not produce safety. It produces shadow contracts you cannot see — the exact failure the contract was meant to prevent, now invisible.
I think about this most clearly because of a case where over-specification was not an option I had. During COVID I worked on a vaccination platform serving 5,000,000+ residents, and the contract between systems there was almost entirely implicit, held under a deadline that was not a sprint boundary but a physical queue of people in a car park. You do not get to write a forty-field schema with strict validation when the clock is a line of people arriving regardless. You specify the few things that must hold — identity, eligibility, the fact that a dose was recorded — and you leave slack everywhere else, deliberately, because a contract that fails closed on a non-essential field will stop the line. Stopping the line costs trust you do not get back. The discipline was knowing which guarantees were load-bearing and which were vanity.
That is the judgment the enforcement-tooling discourse keeps skipping. A contract is a coupling you are choosing to make explicit, and every clause you add is coupling you now have to maintain across an org boundary. So specify the guarantees a consumer would suffer without, and refuse to specify the rest — not because you are lazy, but because each unnecessary constraint is a future negotiation you have signed up to, a change someone else now has to ask permission for. Over-specification is how well-meaning teams turn a contract into the thing everyone works around.
This is the moment the industry is in. The latest ThoughtWorks Technology Radar frames it as a turn from experimentation toward hard-won maturity, and warns about the cognitive debt of code generated faster than anyone understands it. Data contracts sit squarely in that turn. The tooling has arrived. You can enforce a contract in CI today, and you should. But the tool will enforce whatever you specify, including the wrong things, including a cage, including a guarantee no one agreed to own. The maturity is not in the enforcement. It is in knowing that a data contract is an org chart in disguise: a map of who owes what to whom, and what happens when one side changes its mind.
Getting that map right — deciding which disagreements to surface, which guarantees are worth the coupling, and whose name goes on each one — is not schema work. It is the quiet, unglamorous negotiation that decides whether a data platform holds together or slowly comes apart at the seams nobody owns. That is the part I want to be brought into.