There is a comfortable fiction in the current wave of AI tooling: that meaning is something you extract. Point a model at a pile of documents, ask it to pull out the entities and relationships, and out comes structure, clean, queryable, ready for retrieval. I led the build of an ontology generator for a government context, turning sprawling written content into structured knowledge for discovery, governance, classification, and yes, the kind of retrieval that feeds AI systems. The single most useful thing I learned is that meaning is not extracted. It is decided. And deciding is the hard, human, contestable part that no amount of model capability removes.
Structure is a claim about the world
An ontology is a set of decisions about what is the same, what is different, and what is allowed to be true at once. That is not a data problem. That is an argument, written down precisely enough to execute.
When you build an ontology over real organisational content, you are not describing the world as it is. You are making claims about it. You are saying these two terms refer to the same thing, even though three different departments spell it three different ways and mean subtly different things by it. You are saying this concept is a kind of that concept, which sounds innocent until you realise someone's funding, or someone's accountability, depends on those two things being kept apart.
The naive version of this work treats it as deduplication and tagging. The real version is closer to diplomacy. Government content in particular is written by many hands over many years under many regimes of guidance, and the inconsistencies are not noise to be cleaned away. They are fossils of genuine disagreement about what things are. When the model 'helpfully' collapses two near-identical concepts, it is not tidying. It is taking a side in an argument it does not know it is having.
So the design principle I kept returning to was: the system proposes, but the structure has to be defensible. Every merge, every is-a relationship, every classification needs to be something a person can look at and say yes, that is a claim we are willing to stand behind, with the evidence that supports it sitting right there. An ontology you cannot defend is not knowledge. It is a confident guess wearing a schema.
Generation is the easy 80 percent
Large models are genuinely good at the first pass. Given a corpus, they will surface candidate concepts, propose hierarchies, and find relationships a human reviewer would take weeks to spot. If I had been doing this work a decade earlier, that first pass would have been the bulk of the labour. Now it is close to free. That changes where the value is, and it is not where the demos point.
The value is in the disciplined back half: the review, the disambiguation, the boundary cases, the deliberate refusal to assert a relationship the evidence does not support. A generation step that produces a thousand plausible relationships is not a thousand units of progress. It is a thousand things that now need adjudication, and if you cannot adjudicate them at the rate you generate them, you have just built a faster way to accumulate unverified claims. I came to think of the model's output as a hypothesis queue, not a result.
The practical consequence is that I designed the pipeline around human throughput, not model throughput. The bottleneck was never how fast we could propose structure. It was how fast a knowledgeable person could confirm or reject it without losing the thread. So the interesting engineering went into making each decision cheap and well-framed: showing the proposed relationship next to the source text that justified it, clustering similar decisions so a reviewer could rule on a pattern rather than a thousand instances, and making rejection as easy and as recorded as acceptance. Build for the reviewer, because the reviewer is the actual scarce resource.
Why this matters more in the RAG era, not less
There is a story going around that retrieval-augmented generation makes formal knowledge structure obsolete. Just embed everything, retrieve by similarity, let the model sort it out. I understand the appeal and I think it is exactly backwards for serious domains. Similarity search is wonderful at finding things that look related. It is indifferent to whether they should be treated as the same, and that indifference is precisely the thing that gets you in trouble when the stakes are governance and discovery rather than a chatbot's vibe.
A well-built ontology is what lets a retrieval system know that two differently-worded passages are about the same governed concept, or, just as importantly, that two similar-sounding passages are emphatically not. It encodes the distinctions that matter to the institution, the ones that no amount of vector proximity will recover because they were never about surface form in the first place. Structure is how you stop a system from being confidently, fluently wrong about what a thing is.
The lesson generalises beyond government and beyond ontologies. As models get better at producing plausible structure, the scarce and valuable skill is judgement about whether that structure is true and defensible. The machine can draft the argument. Someone still has to be willing to sign it.