Sep 29, 2024 - 16 MIN READ

Regulation as Architecture: Turning the EU AI Act into Controls and Evidence

The EU AI Act isn’t a PDF you “comply with” — it’s a set of control objectives you design into your product: evaluation, documentation, monitoring, and provable safety boundaries.

Axel Domingues

Regulation is usually framed as paperwork.

But if you’ve shipped real systems, you know the truth:

regulation is architecture — with deadlines and penalties.

The EU AI Act is the first time many teams will be forced to operationalize that idea for AI:

classify your system by risk
put safety controls in the loop
prove you did it (documentation + logging + monitoring)
and keep proving it as the system evolves

This article is about turning that into something an engineering org can actually execute.

Not with vibes.

With controls and evidence.

I’m not a lawyer. This is engineering guidance: how to translate legal requirements into systems (controls, telemetry, processes, evidence).

If you’re shipping into regulated domains, involve legal/compliance early — but don’t outsource architecture to them.

The goal

Turn “EU AI Act compliance” into an implementable control map + evidence pipeline.

The mental model

Treat the Act like a set of control objectives — not a checklist.

The output

A living system: model registry, eval harness, audit logs, incident response.

The trap

If you add compliance after shipping, you’ll rebuild everything under pressure.

The EU AI Act in One Picture: Risk Drives Obligations

Most teams get stuck because they start in the wrong place: “What does the Act say?”

Start here instead:

What risk category are we in — and what obligations follow?

EU AI Act risk pyramid: prohibited, high-risk, transparency, minimal

At a high level, the mental buckets look like:

Unacceptable risk (prohibited practices): don’t build / don’t deploy
High-risk AI systems: strong requirements + conformity assessment + continuous obligations
Transparency obligations (certain systems): tell users it’s AI, label synthetic content, etc.
Minimal risk: largely unaffected — but still governed by general product safety + data protection

Even if you’re “not high-risk”, the Act will still shape expectations for:

how you document and monitor the system
how you label AI-generated content
how you respond to incidents Those practices become table stakes.

Timeline Reality: Compliance Is a Roadmap Problem

A regulation with phased application dates creates a new kind of technical debt:

time-bound debt.

If you treat this like “we’ll fix later”, “later” arrives on a calendar.

From the Act’s own entry-into-force and application section, key dates include:

2 Feb 2025: early application for core chapters (including the prohibited practices chapter)
2 Aug 2025: governance + general-purpose AI model obligations begin applying (with additional transitional rules)
2 Aug 2026: the Act applies broadly
2 Aug 2027: certain high-risk classification / obligations kick in later for specific provisions

These phased dates are why EU AI Act work should be a product roadmap track, not a “compliance sprint”.

Engineering takeaway

Your compliance program must be staged: inventory first, then controls, then evidence automation.

Product takeaway

If AI is on your critical path, “legal deadlines” become architecture milestones.

Regulation as Architecture: Controls + Evidence

Here’s the move that makes this tractable:

A regulation is a set of control objectives.
Compliance is the ability to show evidence that your controls are operating.

That’s it.

So the question is not:

“Are we compliant?”

It’s:

What controls do we have?
What evidence proves they are working?
What breaks when the system changes?

This is exactly the mindset shift we learned in 2022 (operational architecture):

you don’t “have reliability”
you have SLOs, alerts, runbooks, incident reviews, and change management

AI compliance is the same category of work.

Step 1: Build an AI System Inventory (Because You Can’t Control What You Can’t Name)

Most organizations fail compliance because they can’t answer basic questions:

What models are we using? Which versions?
Where do prompts live? Who changes them?
Where does user data go?
Which features are “AI-assisted” vs “AI-decides”?

So the first deliverable is boring — and essential:

an AI system registry.

Define your “AI system boundary”

Decide what you include in scope:

the model(s)
prompts/system instructions
retrieval + tool calls
post-processing + safety filters
human review steps
downstream consumers of the output

Create an inventory record per feature

For each AI feature, record:

purpose + user impact
model vendor or open-weights source
data inputs (PII? sensitive data?)
tool access / side effects (email? payments? deletion?)
deployment scope (EU users? internal-only?)
failure modes you already know about

Assign roles (provider / deployer mindset)

Even if you’re “just integrating” a model, you still own:

the product UX
the system behavior
the safety boundary

So assign internal owners:

product owner
engineering owner
risk/compliance owner

If you don’t build an inventory early, everything else becomes theater:

you won’t know what to document
you won’t know what to monitor
you won’t know which changes require re-evaluation

Step 2: Turn Legal Requirements Into Control Objectives

Once you have an inventory, the next move is to translate “requirements language” into engineering controls.

A practical translation for high-impact AI systems looks like this:

Control family A: Risk management + safety boundaries

Control objective: you identify harms, mitigate them, and verify mitigations.

Engineering controls:

risk register per system (hazards + mitigations + owners)
red teaming / adversarial tests (prompt injection, jailbreaks, policy bypass)
sandboxed tool execution (deny-by-default; scoped permissions)
fallback modes (no side effects; human approval; safe response)

Evidence:

risk assessments with dates + sign-off
test results from your eval harness
change logs that show mitigations weren’t silently removed

Control family B: Data governance

Control objective: your training/finetuning/evaluation data is lawful, relevant, and controlled.

Engineering controls:

dataset provenance + license tracking
data minimization (collect less, keep less)
PII handling: redaction, hashing, isolation, retention limits
“no-leak” tests (system prompts, secrets, customer data)

Evidence:

dataset registry entries (source, license, retention)
automated scans and audit logs for access
privacy impact assessments when applicable

Control family C: Transparency + user information

Control objective: users understand when they’re interacting with AI and what limitations exist.

Engineering controls:

UI labeling (“AI-generated”, “AI-assisted”)
confidence indicators that are earned (tied to eval outcomes)
“why this answer?” for RAG (citations + snippets)
clear escalation to a human when stakes are high

Evidence:

screenshots of UX patterns
release notes showing consistent labeling
telemetry showing users used escalation paths

Control family D: Human oversight

Control objective: humans can supervise, intervene, and override when required.

Engineering controls:

review queues for high-stakes outputs
approval gates before irreversible actions
“operator console” for investigation + rollback
kill switches (feature flag + model routing off switch)

Evidence:

audit logs of review actions
incident timelines showing kill switch usage
access control policies for who can override

Control family E: Robustness, security, and monitoring

Control objective: the system is resilient, secure, and monitored for drift and failures.

Engineering controls:

model + prompt versioning
runtime guardrails (policy filters, PII filters, tool ACLs)
monitoring for:
- abuse patterns
- hallucination proxies (e.g., citation mismatch rate)
- cost spikes / latency SLO breaches
- tool call anomalies
incident response playbooks for AI-specific failures

Evidence:

dashboards + alerts tied to thresholds
incident reports and postmortems
penetration tests / threat models

The Evidence Pipeline: Make Compliance a Byproduct of Operating the System

Here’s the engineering secret:

If your AI product is operable, compliance evidence becomes cheap.

If your AI product is not operable, compliance evidence becomes impossible.

So you want an architecture where:

every model/prompt/tool change is tracked
every deployment produces an eval report
every incident produces a timeline and corrective action
every “claim” (accuracy, safety, robustness) is tied to data

Think of this as “CI/CD for trust”.

Evidence pipeline: registry -> eval -> release -> monitoring -> incidents -> audit packet

A model registry

A single source of truth for model versions, prompts, policies, and tool permissions.

An eval harness

Repeatable tests that run on every change — with stored results.

Observability + audit logs

Telemetry for safety, drift, cost, and misuse — plus immutable audit trails.

An audit packet generator

One click: “show me the evidence” for this system, this version, this date range.

A Practical Reference Architecture for EU AI Act Readiness

You don’t need a mega-platform.

You need a few boring, durable primitives.

1) Model & Prompt Registry

Store:

model identifier + version (vendor or open-weights commit)
system prompt / policies (versioned)
tool list + permissions
risk tier classification
owner + escalation contact

Design note: treat prompts as code. PRs. Reviews. Rollback.

2) Evaluation Service

A service that can:

run curated test suites (functional, safety, adversarial)
run RAG-specific tests (retrieval quality + citation correctness)
run tool-use tests (permission boundaries, sandbox escape attempts)
produce a signed report artifact per run

Design note: evaluations are not just “accuracy”. They are controls verification.

3) Runtime Guardrails Layer

At inference time:

input filtering (PII, secrets)
policy enforcement (content policy, domain policy)
tool policy enforcement (deny-by-default, per-user scope)
output filtering / formatting (citations required, structured outputs)

Design note: guardrails must be measurable. If you can’t measure them, you can’t prove they operate.

4) Monitoring + Incident Response

You want:

safety metrics (refusal rate, escalation rate, policy violation attempts)
quality metrics (user feedback, correction rate, citation mismatch proxies)
security metrics (prompt injection attempts, tool call anomalies)
cost & latency metrics (budget enforcement)

And you want playbooks:

“prompt injection wave”
“model regression”
“unsafe tool behavior”
“sensitive data leakage report”

If your only safety mechanism is “prompting harder”, you will not pass the reality test.

A prompt is not a control.

A prompt is a policy suggestion.

Controls are things that fail safely when the model misbehaves.

Turning the EU AI Act Into Tickets: A Minimal, Executable Backlog

If you’re leading an engineering org, you need a plan that decomposes.

Here’s a minimal backlog that actually ships.

Ship the inventory + ownership map

registry table / service
one entry per AI feature
owners + escalation paths

Add versioning and change control

prompts in git
model versions pinned
feature flags for routing + rollback

Build the eval harness (start small)

50–200 “golden” cases per feature
adversarial cases for known failure modes
store results + diffs over time

Add runtime guardrails

tool ACLs + sandbox
PII redaction on input/output where needed
citations required for knowledge answers (when applicable)

Add monitoring + incident workflow

dashboards + alerts
an incident runbook template
a postmortem process that feeds back into eval cases

Add the audit packet generator

export registry + last eval results + deployment history + incident summaries
one PDF/zip per audit request window

Resources

EU AI Act — official text (Regulation (EU) 2024/1689)

The source of truth: risk categories, obligations, timelines, and definitions — start here before translating requirements into controls.

NIST AI RMF 1.0 (risk → controls → monitoring)

A practical engineering-friendly framework for turning “risk” into control objectives, metrics, and governance you can actually operate.

ISO/IEC 42001 (AI management system standard)

A management-system blueprint for AI: policies, roles, lifecycle controls, continual improvement — the “org spine” your evidence pipeline hangs on.

CEN-CENELEC JTC 21 (EU AI standardization workstream)

Where the harmonized standards work happens — useful for mapping “legal requirements” to “technical ways of meeting them.”

What’s Next

This month was about making regulation operational:

controls you can implement
evidence you can generate automatically
and a compliance spine that doesn’t collapse when you change models

Next month we go deeper into the runtime itself:

Reasoning Budgets.

Because once you’re operating within compliance constraints, the next question becomes:

How do you spend “thinking time” like a budget — and prove the system used it wisely?

Reasoning Budgets: fast/slow paths, verification, and when to “think longer”

“Think step by step” isn’t an architecture. In production you need budgets, routing, and verifiers so the system knows when to go fast, when to slow down, and when to refuse.

Tool Use with Open Models: function calling, sandboxes, and “capability boundaries”

Tool use is where LLMs stop being “text generators” and start being integration surfaces. With open weights, reliability isn’t a given — you have to engineer it with contracts, sandboxes, and explicit capability boundaries.