Building a GDPR-compliant backend: an engineer's checklist

GDPR is usually treated as a legal checkbox bolted on at the end. The decisions that actually carry legal weight are architectural — and they are cheap early, expensive late.

Most teams treat GDPR as something the legal team handles and the engineers retrofit. That ordering is exactly backwards. The principles that carry real weight are architectural decisions, and like all architectural decisions they are cheap to make early and brutally expensive to bolt on late.

Here is the checklist I work through, translated from legal principle into system requirement.

1. Lawful basis is data, not a footnote

Every piece of personal data you store exists under a lawful basis — consent, contract, legitimate interest. Treat that basis as a first-class field attached to the data, not a paragraph in a policy doc. When a user withdraws consent, you need to answer "what processing was that consent holding up?" as a query, not an investigation.

2. Data minimization at the schema level

Privacy by design (Art. 25) is not a vibe; it is a schema review. For every column ask: do we actually need this, for this purpose, for this long? The cheapest personal data to protect is the data you never collected. Minimization done at design time is a one-line decision; done later it is a migration, a backfill, and a risk assessment.

3. Retention as a first-class concern

GDPR requires you not to keep personal data longer than necessary. In practice this means retention has to be enforced by the system, not by good intentions. Every personal-data store needs a defined lifetime and a mechanism that actually deletes — a TTL, a scheduled purge, a documented exception. "We never delete anything" is a finding, not a strategy.

4. Right to erasure is a design constraint

Art. 17 — the right to be forgotten — is trivial if you designed for it and a nightmare if you did not. Can you actually delete one user's data across every table, cache, search index, backup policy, and downstream processor? If the honest answer is "not without a week of work," you have a latent compliance gap. Model deletion as a cascade from day one.

5. Data residency and transfers

Where does the data physically live, and what crosses a border? EU-hosted infrastructure, EU-region model providers, and an honest map of every third party that touches personal data. Each transfer outside the EEA needs a legal mechanism behind it. The engineering job is to know the map and keep it true.

6. Processor relationships are contracts plus controls

Every external service that processes personal data on your behalf needs a Data Processing Agreement (Art. 28) — and, increasingly relevant, your AI tooling counts. If you pipe user data through an LLM API, the terms under which that provider handles it are part of your compliance posture, not an implementation detail.

7. The audit log is the difference between 'trust me' and 'here'

When something goes wrong — a breach, a complaint, a regulator's question — the organisations that survive cleanly are the ones who can show what happened. An append-only audit log of access to and processing of personal data turns "we think we're fine" into "here is the record."

The lens

Notice the pattern: each legal principle becomes a concrete system requirement. Lawful basis becomes a field. Minimization becomes a schema review. Retention becomes a TTL. Erasure becomes a cascade. That translation — from regulation to architecture — is the whole job, and doing it at design time is the difference between compliance that is structural and compliance that is theatre.

I build backends with this baked in from the schema up, and audit the ones that weren't. If GDPR is currently a checkbox you are nervous about, let's talk.