The NaPTAN change stream, made to misbehave on purpose.
A working model of a data-platform ingestion pipeline — validation, dedup, exactly-once apply, backpressure, and an atomic backfill cutover. Inject bad records and duplicates, push the rate past capacity, and trigger a backfill while the stream is live. The guarantee at the bottom proves nothing was lost or double-applied. Runs entirely in your browser.
Validation is the gate
Bad records — impossible coordinates, invalid status transitions — are rejected before anything downstream sees them. On NaPTAN, a bad record published to an unbounded consumer set is unrecallable, so the only safe place to be strict is before the commit.
Exactly-once, not at-least-once
Streams redeliver. The pipeline keys on an idempotency token: a key it has already applied is counted as a duplicate and dropped, never applied twice. The guarantee line stays green because applied = unique valid keys.
Backpressure is a feature
When arrivals outrun apply capacity, events buffer instead of dropping. The queue depth is the honest signal of how far behind you are — and the thing you alert on, long before data loss.
Backfills are migrations
Reprocessing history is the riskiest routine a platform runs. Here it's an atomic cutover: build the corrected version in the shadow, keep buffering live events, then swap the pointer in one step. Zero loss, zero double-apply — read the essay.
This is a faithful simulation of the mechanisms, running in your browser — a discrete-event model, not a production broker. The same patterns (idempotent apply keyed on a token, a validation gate before commit, queue-depth backpressure, shadow-build-then-atomic-swap backfills) are how I approach the real NaPTAN ingestion pipeline. A runnable docker-compose reference implementation is a separate project.