Turning a regulation into testable code

How to take a dense statute from primary source to an audit-ready detector — separating literal text from enforcement practice, and deriving tests from the regulation itself.

Engineers usually meet a regulation as a PDF and a deadline. The instinct is to read it once, translate the sentences literally into if statements, and move on. That produces software that is wrong in exactly the ways that matter, because the literal text and the way a rule is actually enforced routinely diverge.

Here is the method I use to take a regulation from primary source to code you can defend line-by-line in an audit.

Read the primary source, not a summary

Summaries are written by people who already decided what matters. A compliance engine fails on the cases the summary skipped. So the first step is unglamorous: read the actual articles, the implementing decisions, the official guidance notes, and — this is the part most people miss — the enforcement practice.

Separate the rule from how it's enforced

The core data structure is one entry per clause:

statutory_rule — what the text literally says.
enforcement_overlay — how authorities actually apply it, which often differs.
divergence_note — where, and why, the two pull apart.

Mark the overlay as the binding behaviour and keep the statutory rule as a citation. Now the detector has a single source of truth, but the divergence stays auditable — when someone asks "why does the engine flag this?", you can point to both the text and the practice.

A concrete shape of this: a rule reads as 'per transaction', but enforcement applies it as a 24-hour aggregate per counterparty. A literal implementation passes the text and fails the audit. The overlay is what you actually build.

Each rule becomes a precise algorithm

Once a clause is captured, translate it into an algorithm with five explicit parts:

Inputs — exactly which fields, in which units.
Reference window — daily, fortnightly, a rolling multi-week period. This is where most bugs live.
Threshold table — the numbers, and the bands between them.
Severity classification — how a breach is graded.
Edge cases and dedup — how this rule interacts with adjacent rules so one event does not fire three findings.

Writing this before any code forces the ambiguities into the open, which is exactly where you want them — in review, not in production.

Derive the tests from the regulation

The trick that makes compliance code trustworthy without access to a private benchmark: the tests come from the regulation, not from the implementation.

Fixtures per rule — the canonical violation, the boundary cases one second over and under the threshold, and the explicit non-violations.
Property-based tests — invariants that must hold for any generated input. Monotonicity (making a breach worse never lowers its severity). Conservation (durations sum to the window minus annotated gaps). Locality (a derogation applied to one segment does not change detection elsewhere).

If the engine satisfies invariants derived from the regulation on adversarially generated input, the residual surface where it can still be wrong is small — and the diagnostics localise fast.

Why this holds up

The output of this process is not just code. It is a citation-backed specification, a detector that maps to it, and a test suite whose assertions trace back to the text. That is what "audit-ready" actually means: when a regulator or a client's lawyer asks why the system did what it did, every answer has a paper trail.

I turn regulations into testable detection engines — GDPR, transport, AML, and others — in pure Go and Python. If you have a rulebook that needs to become software you can defend, get in touch.