
This is the 2025 finale: a practical reference architecture for running fleets of agents with governance—connectors you can trust, traces you can debug, evals you can ship, and humans you can hand off to.
Axel Domingues
All year, the story of agents changed.
In January, an “agent” still meant: a model with tools.
By December, an “agentic product” means something else entirely:
a platform that runs many agents, with many connectors, across many teams — without turning into chaos.
This article is the synthesis.
If you’re jumping in here, treat v1 as the “non-AI platform” foundation, and v2 as the upgrade that makes it agent-native.I wrote Reference Architecture v1 as the finale of the 2022 “operational architecture” year: a defendable baseline for a modern cloud product (APIs, distributed data, async workflows, CI/CD, observability, security, and governance). v2 is that same spine — plus the agent-specific control plane: orchestration, connector/tool governance, evaluation in CI, and the reliability guardrails needed to run fleets of probabilistic workers.
A Reference Architecture v2 for an operable agent platform:
It means you can point to a diagram, name the invariants, and defend your decisions under pressure: security reviews, incident postmortems, cost spikes, compliance asks, and product deadlines.
The goal
Run fleets of agents safely, cheaply, and repeatably.
The shift
Agents stop being a feature and become a runtime.
The constraint
Connectors turn into production attack surface.
The definition of done
You can operate it: observe, debug, roll back, and audit.
“Operable” is not a buzzword. It’s a list of things you can do at 3AM when something breaks.
An operable agent platform can answer these questions quickly:
If you can’t do that, you don’t have an agent platform.
You have a demo that will eventually become an incident.
The cleanest mental model I’ve found is the same one we use for infrastructure platforms:

Data plane
Executes agent runs, tool calls, voice turns, and workflows.
Control plane
Defines policies, versions, approvals, routing, and audit requirements.
The mistake teams make is mixing these.
If your data plane “decides” policy at runtime with no versioning, approvals, or audit — you can’t govern change. And if your control plane tries to run the workload, you can’t scale execution cleanly.
So RA v2 keeps them separate.
Here’s the platform decomposed into components you can actually build and assign teams to.

This is the surface area your users touch:
Rule: Experiences should be thin. They collect context, render outputs, and host a clean human handoff path. They should not contain business logic for agent orchestration.
The session engine is the heart of the data plane.
It owns:
This is where multi-agent composition happens safely: supervisors, specialists, delegation, and structured handoffs.
Agent work is not always interactive.
If your platform can’t:
…then it’s not a platform.
This engine should feel familiar if you’ve built distributed systems:
(If you’ve read my earlier outbox/sagas work, you’ll recognize the patterns.)
Tools are “just functions”… until they aren’t.
The connector layer provides:
This is also where MCP-style connector ecosystems start to matter: a standard protocol for tool discovery, schemas, and execution boundaries.
The control plane is where you make safety and compliance real, not aspirational.
It owns:
Agents are probabilistic — so you need observability that treats them as distributed systems.
At minimum:
Reference architectures are useless if they don’t specify what must never be violated.
Here are the invariants I treat as non-negotiable for an operable agent platform:
Identity & tenancy
Every action is attributable: user, tenant, agent, connector, run.
Least privilege by default
Tools run with minimal scope, time‑boxed credentials, and explicit grants.
Deterministic envelope
Even if model output is probabilistic, the execution contract is deterministic.
Auditable decisions
Policy version, model selection, tool choice, and overrides are recorded.
Under these invariants, “agents” become something you can safely operate:
The trick to building reliable agent systems is not pretending the model is deterministic.
It’s building a deterministic envelope around it.
That envelope is a contract.
A good envelope includes:
Think of it like this:
The model is a creative proposal engine.
The platform is the execution authority.
Let’s make the architecture concrete with three flows you’ll almost certainly run in production.
The failure mode: the agent improvises permissions and sends the wrong thing.
The RA v2 approach:
The failure mode: retries duplicate side effects or silently skip failed steps.
The RA v2 approach:
The failure mode: latency spikes destroy UX, and misrecognitions become irreversible actions.
The RA v2 approach:
By late 2025, most teams discover the same painful truth:
Connectors scale faster than trust.
You start with 5 tools. Then someone adds 15 more. Then teams copy/paste wrappers. Then the platform becomes an un-auditable jungle.
So governance must be built in.
A connector is registered with:
Automated checks run:
Publishing requires:
Runtime enforcement includes:
A boring connector process means:
- easy to do the right thing
- hard to do the unsafe thing
- and impossible to ship changes with zero traceability
In 2023 I treated evaluation as the missing discipline for LLM features.
In 2025, evaluation becomes the missing discipline for agent platforms.
Because now failures aren’t just “wrong text.” They’re:
So RA v2 requires an eval pipeline that looks like software engineering:
The strongest agent platforms don’t try to eliminate humans.
They treat humans as:
Handoff is not a button. It’s a workflow with guarantees:
RA v2 places handoff inside the deterministic envelope: a state transition, not a best-effort UX flourish.
You don’t implement this architecture by rewriting everything.
You implement it by turning chaos into contracts — gradually.
No. RA v2 is valuable even for a single agent, because the hard parts are the same: tool safety, observability, evaluation, and governance.
Multi-agent composition simply makes the need impossible to ignore.
Treat MCP-style protocols as connector plumbing: a way to standardize discovery and invocation.
But don’t confuse “standard protocol” with “safe platform.” Safety comes from policy enforcement, least privilege, versioning, and auditing — which RA v2 supplies.
No. The components map cleanly to most clouds and on-prem setups. The key idea is separation of concerns: data plane execution vs control plane governance.
Teams ship tool integration without a deterministic envelope: no budgets, no idempotency, no replay, no policy versions.
It works until it doesn’t — and then it becomes un-debuggable.
In 2018, RL taught me that unstable training loops need stabilizers.
In 2025, agent platforms taught me the same lesson in a new form:
A tool-using system is a feedback loop.
Reliability comes from the stabilizers you design around it.
Reference Architecture v2 is my attempt to name those stabilizers clearly:
This closes the 2025 series: Agents become platforms (and platforms need governance).
In 2026, I’m switching modes again.
Back to a personal research journey — but this time inside the LLM/agent frontier: new protocols, emerging architectures, and the experimental edge where today’s “best practices” are still being invented.