An ontology generator for the Government Digital Service

Context

Government publishes an enormous amount of content, and most of it is flat — pages and documents with no machine-readable sense of how concepts relate. That makes discovery, governance, and classification hard, and it makes AI retrieval worse: a model asked to answer from a pile of unstructured pages has nothing to navigate.

I tech-led an alpha that generates an ontology and knowledge graph from published government content — the structure beneath the prose, made explicit.

The hard part

An ontology is not a neutral diagram; it is an argument about how the world is carved up, and reasonable people disagree. Hand-authoring one at the scale of government is intractable; generating one risks encoding the wrong distinctions confidently. The work is in getting structure that is good enough to be useful and honest enough to be corrected.

The retrieval angle sharpens it: for RAG to ground answers well, the graph has to reflect real relationships, not surface co-occurrence.

Architecture

Bootstrap structure from the content, then curate it — and shape the graph so retrieval has something real to walk. The same move, made playable on open data: flip one modelling decision and watch what a retriever grounds on.

Key decisions

Generate, then curate

WhyHand-authoring an ontology at government scale is intractable; bootstrapping from the content makes the first 80% tractable and leaves humans to argue about the part that matters.

Trade-offGenerated structure can be confidently wrong, so curation and review are first-class, not an afterthought.

Shape the graph for retrieval

WhyRAG only grounds well when the structure reflects real relationships. The graph is designed to be walked by a retriever, not just read by a human.

Trade-offOptimising for retrieval constrains some modelling choices you might otherwise make for pure conceptual elegance.

Related writing

On why ontologies are arguments rather than diagrams, see Ontologies Are Arguments.