
In 2025, “agents” stop being demos and start being products. This is the month you realize you don’t need a smarter model — you need a runtime: durable execution, safety gates, and traces you can debug.
Axel Domingues
March was the compliance wake-up call.
April is the engineering one.
Because once you decide an agent is allowed to act (call tools, click buttons, run workflows, touch data), the hard problem stops being “prompting” and becomes:
How do we run this thing reliably, safely, and debuggably?
That’s what agent runtimes are for.
Not a framework. Not a prompt library.
A runtime — the execution substrate that turns probabilistic reasoning into repeatable operations.
And every platform eventually needs the same three things:
- orchestration primitives (state, retries, timeouts)
- security boundaries (permissions, isolation, approvals)
- observability (traces, not vibes)
What changed in 2025
Agent capabilities improved — but the bigger shift is how we package execution: SDKs now ship with state, policies, and tracing.
The new unit of work
Not “a prompt”.
A run: multi-step, tool-heavy, budgeted, auditable.
The runtime job
Turn plans into operations: manage state, control tool access, handle retries, and surface traces humans can debug.
The failure mode
Without a runtime, “agent reliability” becomes folklore: invisible loops, silent tool errors, and zero accountability.
Think of it like the JVM or the browser runtime — not the language model, but the thing that executes safely.
An agent runtime owns these responsibilities:
If your system lacks any of these, you don’t have a runtime — you have a demo.
If your agent does more than one external side effect (emails, tickets, payments, deploys, UI clicks), you need durable execution + traceability.
In 2023–2024 we talked about tool use.
In 2025, the real product surface is execution.
So “agent SDKs” are evolving into something bigger:
This isn’t hype.
It’s a convergent architecture.
Every team that ships agents ends up reinventing the same primitives — until they standardize them.
Let’s name the primitives explicitly.
Because “agent orchestration” sounds mystical until you reduce it to runtime contracts.
Runs + checkpoints
A run has an ID, a current step, and a resumable checkpoint. If the worker dies, the run continues.
Deterministic state transitions
Treat run state as a state machine: explicit transitions, explicit failure states, no invisible “thinking”.
Tool contracts
Tools are APIs with schemas, auth scopes, and error contracts — not “functions the model can magically call”.
Budgets + timeouts
Cap tokens, wall time, and tool spend per run. Timeouts are not edge cases — they’re the default.
Retries + idempotency
Retries must be safe. Side effects need idempotency keys and “already done” handling.
Human gates
Approvals are first-class transitions. Humans are not “exceptions” — they’re part of the runtime.
If you’ve built distributed systems, none of this is new.
The only novelty is that the “business logic” is partly probabilistic — so you must compensate with stronger operational constraints.
If the agent can click “Confirm purchase” twice, you need idempotency and approvals — not a better prompt.
A healthy runtime doesn’t let the model directly perform irreversible actions.
It wraps action in structure:
This is how you make “agentic” feel like engineering again.

Agents fail in ways that are qualitatively different from normal services:
So you need tracing, not just logs.
You don’t need perfection. You need consistency.
agent.run
agent.step (plan | tool | verify | summarize)
llm.call
tool.call (tool_name)
retrieval.query (optional)
policy.check (optional)
approval.wait (optional)
And you need consistent attributes:
{
"run_id": "run_123",
"step_id": "step_07",
"model": "gpt-4.1",
"tool": "crm.create_ticket",
"budget_tokens_remaining": 1200,
"timeout_ms": 15000,
"retry_count": 1,
"policy_decision": "allow",
"human_gate": "required:false"
}
Logs and traces have a habit of turning into a shadow data lake. Redaction, retention, and access control are runtime features — not compliance paperwork.
Here’s a blueprint that survives contact with production.

This is also why “runtimes” are emerging as products: every serious team ends up building this stack anyway.
A runtime is not a rewrite. It’s a set of constraints you add until the system becomes operable.
Define what a “run” is:
If a worker dies, can you resume? If the answer is “no”, you’re still in demo territory.
Email, ticket, payment, deploy, UI click: every action must be safely repeatable.
“Allowed?” is a state transition, not a comment in code.
If you can’t debug it, you can’t ship it.
Teams that do it backwards end up with a smart agent they’re afraid to run.durability → safety → observability → intelligence
OpenTelemetry Traces
The baseline for distributed tracing — and the lingua franca for agent run observability.
OpenTelemetry Semantic Conventions
How to standardize span names and attributes so traces stay queryable at scale.
Temporal (Durable Execution)
A practical reference for retries, timeouts, and crash-proof workflows — the core of production orchestration.
A workflow engine is a big part of it — durable state, retries, timeouts.
But a runtime also includes:
If it has no side effects, you can often get away with a simpler “LLM app” architecture.
The moment it:
…you want runtime primitives, or you’ll reinvent them under pressure.
Buy the primitives you can standardize:
Build the parts that encode your domain:
The best strategy is usually: compose a runtime from strong building blocks.
April is where “agents” start looking like a platform layer.
Next month, we hit the economic reality shift that changes system design again:
The 1M-Token Era: how long context changes retrieval economics and system design
When context becomes cheap enough (or big enough), the architecture of memory, retrieval, and caching gets rewritten.
The 1M-Token Era: how long context changes retrieval economics and system design
Long context doesn’t kill RAG — it changes what’s cheap, what’s risky, and what needs architecture. This month is a practical guide to building “context-first” systems without shipping a cost bomb (or a data leak).
The Compliance Cliff: prohibited practices and governance controls that actually ship
Prohibited practices aren’t a legal footnote — they’re product constraints. This month is about turning “don’t do this” into guardrails you can deploy: policy gates, capability limits, audit trails, and incident-ready governance.