
Performance isn’t a tuning phase — it’s an architecture property. This month I lay out an end-to-end mental model (browser → edge → app → data) and the practical playbook for improving both “fast on average” and “fast under load” without shipping fragile optimizations.
Axel Domingues
Most teams treat performance like this:
That’s not performance engineering.
That’s performance debt collection.
This month is about a different posture:
Performance is an end-to-end architecture property — and the “end-to-end” part is the trap.
Because your users don’t experience “the backend” or “the frontend”.
They experience the critical path from click → pixels → interaction → trust.
And the brutal part is:
Performance is the sum of that path — plus the queueing you created along the way.Browser → Edge/CDN → Load balancer → App → Dependencies → Data → Back again.
The goal this month
Build a practical end-to-end performance model: measure → attribute → fix → validate.
The core shift
Stop optimizing components in isolation. Optimize the critical path and the tails.
What “good” looks like
Fast for real users, fast under load, and changes don’t regress silently.
The secret weapon
Budgets + guardrails (CI, dashboards, alerts) so performance stays a property, not a hero project.
When a page feels slow, the question is not “where is the bottleneck?”
The question is:
Where is time being spent on the critical path, and why does it get worse at p95/p99?
Here’s the map I use. It’s not perfect, but it’s actionable.

You can “feel” that something is slow — but you can only fix what you can locate on the path.Don’t optimize what you can’t attribute.
Performance metrics are like financial metrics: easy to game, easy to misread.
If your p50 is great and your p99 is awful, your users will still tell you “it’s slow” — because the tail is where trust dies.
TTFB usually improves when you reduce:
Think of TTFB as:
“How quickly can the system produce the first byte when it’s not stuck waiting in line?”
Tail latency usually explodes when you introduce:
Think of tail latency as:
“What happens when the system is under pressure, and variance becomes your enemy?”
Not all optimizations are equal.
Here’s the order I like — because it produces stable wins.
This is the reason “performance experts” start with architecture, not flame graphs:
Most systems are slow because of shape, not because of a slow for loop.
I use the same loop for almost every performance project.
Pick targets that reflect user experience:
You want a breakdown like:
Use correlation IDs across logs/traces so “one request” stays one story.
Pick one:
Test in a way that matches reality:
If the improvement isn’t guarded, it’s temporary.
Most meaningful slowdowns happen on flows, not single calls.
A checkout flow might be:
If each step is “acceptable”, the flow can still feel slow.
So I prefer this split:
Endpoint health
Latency and errors per API, per dependency, per region.
Journey health
Time across multi-step flows (critical business transactions).
A performance culture measures both.
Because your users live in journeys.
Why it works: less network, less parsing, less memory, less time everywhere.
Why it works: your p95/p99 stop paying for background work.
Why it works: edge caching removes entire backend hops from the path.
The classic tail-latency killer is “call 6 services in parallel”.
Even if each is “fast enough”, the combined result is gated by the slowest.
Mitigations:
Retries are not free.
Guardrails:
If you want a mature performance practice, this is the line you cross:
You optimize not for “fast when idle”, but for “predictable under load”.
That means learning to love:
Symptoms:
Fix:
Symptoms:
Fix:
Symptoms:
Fix:
Symptoms:
Fix:
Budgets are the difference between “we fixed it once” and “we stay fast”.
Examples:
Where budgets live:
Start where the user pain is measurable.
If LCP/INP is bad, start with:
If TTFB and API latency are bad, start with:
If you care about p95/p99, yes — because tails often appear only under pressure.
But you don’t need a heroic “load test project.” Start with a small, repeatable test on your top flows and run it on every release.
Treating performance fixes as one-off hero work.
Without budgets and guardrails, regressions are inevitable. A fast system stays fast because it’s defended by process, not memory.
This month was about performance as an end-to-end discipline:
Next month we go up a layer:
Cloud Infrastructure Without the Fanaticism: IaaS, PaaS, Serverless, Kubernetes
Because most performance discussions get stuck at “tune the app”…
…when the platform choices you made are often the largest constraint on latency, reliability, and cost.
Cloud Infrastructure Without the Fanaticism: IaaS, PaaS, Serverless, Kubernetes
A practical mental model for choosing cloud primitives without ideology—based on responsibility boundaries, scaling, reliability, cost, and team operating capacity.
API Evolution at Scale: Compatibility, Contracts, and Consumer-Driven Testing
APIs don’t fail because they’re slow — they fail because they change. This month is about designing contracts you can evolve, enforcing compatibility automatically, and scaling teams without “everyone upgrade on Tuesday.”