May 31, 2026 - 17 MIN READ

Search Becomes an Agent Runtime

Google’s Gemini Spark and AI Mode point to a bigger shift: Search is no longer just retrieval plus ranking. It is becoming a runtime for synthesis, generated UI, monitoring agents, and action. This post explains the architecture of actionable retrieval — and the reliability contracts needed when search starts doing things.

Axel Domingues

May’s hot topic was not simply “Google added more AI to Search.”

That story is too small.

The bigger shift is this:

Search is becoming an agent runtime.

For twenty-five years, Search mostly meant:

parse a query
retrieve documents
rank links
let the user decide what to trust and what to do next

Generative search changed the first half:

retrieve sources
synthesize an answer
cite or summarize evidence

But the May 2026 direction pushes further:

AI Mode turns search into a conversational, multimodal workspace.
Search agents turn queries into background monitors and task runners.
Gemini Spark points toward an always-on personal agent that can operate across a user’s digital life.
Generated UI turns search results from “links” into task-specific interfaces.

That is a new architecture.

Not “better autocomplete.”

Not “chatbot inside a search page.”

A runtime where retrieval, synthesis, planning, UI generation, and action share the same surface.

When Search starts acting, the risk profile changes.

A bad answer is one failure mode. A bad answer that schedules, buys, emails, books, or monitors on your behalf is a different system entirely.

The trend

Search is moving from retrieval interface to agentic action surface.

The architecture shift

The pipeline becomes: query → retrieve → synthesize → plan → generate UI → act → monitor.

The reliability problem

If answers become actions, source fidelity, permissions, and auditability stop being optional.

The engineering thesis

Actionable retrieval needs contracts: evidence packets, action gates, source scoring, and rollbackable workflows.

Search in one sentence (the old contract)

Classic Search was a ranking system.

It returned documents and made the user responsible for:

reading
comparing
judging credibility
deciding what to do next

That was imperfect, but it had a clean boundary:

Search helped you find information.
You performed the action.

Agentic Search blurs that boundary.

Now the system may:

summarize the web
choose which sources matter
decide when to ask follow-up questions
build a custom interface
run a background monitor
and trigger tools or transactions

That is why this topic belongs in an architecture blog.

The product surface changed, but the deeper change is the control flow.

What “actionable retrieval” means

Retrieval used to end at “here are relevant documents.”

Actionable retrieval ends at:

“Here is the answer, here is the evidence, here is the proposed action, and here is the control boundary before anything changes.”

That adds three new responsibilities to the search stack.

Evidence

The system must know where claims came from and whether cited sources actually support them.

Intent

The system must separate “I want information” from “I want you to do something.”

Authority

The system must know which actions it is allowed to take, under whose identity, and with what approval.

Memory

The system must decide what persists: monitors, preferences, tasks, reminders, and user-specific context.

This is where Search becomes much closer to an agent platform than a web page.

AI Mode: search becomes a workspace

AI Mode changes the shape of search from “query and links” to “conversation plus workspace.”

That matters because long, messy queries become normal:

“Compare these options”
“Use this file”
“Look at these tabs”
“Plan this task”
“Keep checking this for me”

The query is no longer a short keyword string. It becomes a task packet.

A task packet can include:

natural language intent
files or images
browsing context
user preferences
location or calendar context
constraints (“cheap”, “near me”, “family-friendly”, “don’t book yet”)

The hard part is not generating a fluent answer.The hard part is deciding:

what evidence is allowed in the context
what user state is relevant
what action is being proposed
and what must stay behind an approval gate

Gemini Spark: the always-on agent pattern

Gemini Spark is interesting because it points to the other side of agentic search:

background continuity.

Search traditionally answered now.

An always-on agent can:

monitor information over time
notice changes
summarize deltas
remind you when something matters
prepare actions for approval

This is not “one request.” It is a workflow.

A Spark-like agent needs a durable loop:

define the monitoring goal
retrieve periodically or subscribe to signals
compare against prior state
decide whether anything changed
notify the user
optionally propose action
log what happened

That looks less like Search and more like a tiny SRE system for your digital life.

Background agents are powerful because they continue after the tab closes.That means they need:

ownership
budgets
stop conditions
expiration dates
and a visible “what is this agent doing?” panel

The architecture blueprint: Search as an agent runtime

Here is the mental architecture I’d use for agentic search.

Search as an agent runtime: query understanding, retrieval, evidence scoring, synthesis, generated UI, action planner, tool gateway, monitoring loop, telemetry

This is the difference between “AI search feature” and “search runtime.”

The hardest problem: evidence fidelity

Generative search has one uncomfortable property:

It collapses many sources into one answer.

That creates convenience, but also removes friction.

Users no longer inspect ten links. They read the synthesis.

So the system inherits editorial responsibility.

Source selection

Which pages were retrieved, ranked, included, ignored, or suppressed?

Claim support

Does each important claim actually follow from the cited source?

Conflict handling

When sources disagree, does the answer expose disagreement or average it away?

Publisher impact

If answers replace clicks, source ecosystems and incentives change.

A useful internal invariant:

A cited answer should be decomposable into claims, and each high-risk claim should map to supporting evidence.

This is not academic neatness. It is how you debug hallucination in a search product.

Action gates: the line between answer and execution

Agentic Search must distinguish four levels of authority.

Level 1 — Inform

The system can answer, summarize, compare, and cite.

No side effects.

Level 2 — Prepare

The system can fill forms, draft messages, create plans, and stage actions.

Still no side effects.

Level 3 — Ask

The system can request explicit approval for a specific action.

The approval must show:

target
cost
recipient / vendor
data shared
rollback options

Level 4 — Act

The system performs the action through a governed tool gateway.

Every action gets logged. Irreversible actions require stronger gates.

The dangerous product shortcut is collapsing Level 2 and Level 4:

“I prepared this” becomes “I did this.”

That is how helpful agents become incident generators.

Generated UI: useful, but not a permission system

Generated UI is one of the most interesting parts of agentic search.

Instead of returning a static answer, the system can generate:

comparison tools
filters
timelines
calculators
travel planners
shopping flows
dashboards

That is powerful because the UI can fit the task.

But generated UI also creates a governance problem:

Who decides what this UI is allowed to do?

A generated booking interface must still respect:

consent
payment rules
cancellation policies
data sharing boundaries
accessibility
auditability

Generated UI should be treated like a view over governed capabilities.

The UI may be generated. The permissions must not be.

Background agents: monitors need budgets and expiration

A search agent that watches the web for you is extremely useful.

It is also easy to forget.

That makes lifecycle management non-negotiable.

Every monitor should have:

owner
purpose
sources / scope
frequency
budget
expiration date
notification channel
stop button
audit history

Monitor contract

A background search agent should be visible, bounded, revocable, and explainable.

Without that, you get zombie agents:

still searching
still spending
still notifying
still holding permissions
long after the user forgot why they existed

Failure modes I expect in agentic search

This is where the engineering discipline shows up.

What to measure

Agentic Search needs dashboards beyond click-through rate.

A minimal runtime dashboard should include:

Retrieval quality

Source diversity, freshness, authority, and query-source alignment.

Claim fidelity

Unsupported-claim rate, citation mismatch rate, and conflict handling quality.

Action safety

Approval rate, denied actions, risky tool calls, and rollback frequency.

Agent lifecycle

Active monitors, expired agents, budget usage, and notification usefulness.

The old search metric was “did the user click?”

The agentic search metric is closer to: Did the system help the user complete the right task, with the right evidence, under the right authority?

If I were designing this system: my minimum viable contract

I would not start with “make Search more intelligent.”

I would start with contracts.

Separate answer mode from action mode

A query must be classified before the system can act. The default should be informational.

Emit evidence packets

Every synthesized answer should include source IDs, claim mappings, freshness, and confidence metadata.

Route all side effects through a tool gateway

Bookings, purchases, messages, calendar changes, and subscriptions must never be raw model actions.

Require explicit approval for irreversible actions

The user sees what will happen before it happens.

Add lifecycle management for background agents

Every monitor has owner, scope, budget, expiry, and kill switch.

Log the whole path

Query → sources → answer → UI → proposed action → approval → tool call.

If you can’t reconstruct it, you can’t operate it.

May takeaway

Search used to be an index.

Generative Search made it an answer engine.

Agentic Search turns it into a runtime.

May takeaway

When Search starts acting, retrieval quality is only the first problem.

The real architecture is actionable retrieval: evidence fidelity, intent boundaries, generated UI, tool gateways, background monitors, and auditability.

Resources

Google — A new era for AI Search

Google’s May 2026 framing of AI Search: advanced model capabilities, search agents, agentic coding, and a new AI-powered Search box.

Google — The Gemini app becomes more agentic

The Gemini Spark announcement: a 24/7 personal AI agent designed to manage tasks under user direction.

The Verge — Google Search AI update

Useful product-level coverage of AI Mode, agentic search capabilities, file/context attachments, and generated UI.

Measuring Google AI Overviews

A May 2026 measurement study on activation, source quality, claim fidelity, and publisher impact in AI Overviews.

How Generative AI Disrupts Search

An empirical study comparing traditional Google Search, AI Overviews, and Gemini; useful for thinking about source selection and robustness.

Answer Bubbles

A useful framing for source-selection bias, source-summary fidelity, and different information realities in AI-mediated search.

FAQ

Frontier Model Release Governance

GPT-5.6 made the release pipeline itself the story: restricted access, government review, vetted partners, capability evaluations, and staged rollout. This post explains why shipping a frontier model now looks less like a product launch and more like a national-security workflow.

The Model Router Era

Weekly frontier releases turned “pick a model” into an operational anti-pattern. In April 2026, routing became the real product: choose models per request, enforce eval gates, and budget cost/latency/security like an SRE system.