Context
Government publishes an enormous amount of content, and most of it is flat — pages and documents with no machine-readable sense of how concepts relate. That makes discovery, governance, and classification hard, and it makes AI retrieval worse: a model asked to answer from a pile of unstructured pages has nothing to navigate.
I tech-led an alpha that generates an ontology and knowledge graph from published government content — the structure beneath the prose, made explicit.
The hard part
An ontology is not a neutral diagram; it is an argument about how the world is carved up, and reasonable people disagree. Hand-authoring one at the scale of government is intractable; generating one risks encoding the wrong distinctions confidently. The work is in getting structure that is good enough to be useful and honest enough to be corrected.
The retrieval angle sharpens it: for RAG to ground answers well, the graph has to reflect real relationships, not surface co-occurrence.
Architecture
Key decisions
Related writing
On why ontologies are arguments rather than diagrams, see Ontologies Are Arguments.