
Caching is not “make it faster.” It’s a contract: what can be stale, for how long, for whom, and how you recover when it lies. This month is a practical architecture guide to caching layers that scale without corrupting truth.
Axel Domingues
Caching is the oldest performance trick in software.
It’s also one of the fastest ways to ship silent correctness bugs.
By October 2021 I’ve seen the same failure pattern over and over:
The ghosts are always the same:
This post is about replacing folklore with a mental model you can design, operate, and defend.
The promise
Lower latency and higher throughput without breaking correctness.
The risk
Stale or cross-user data leaks that look like “random bugs”.
The real skill
Choosing what can be cached and what invariants must never be cached.
The outcome
A cache design that is observable, bounded, and recoverable.
People joke about “the two hard things”:
Funny because it’s true — but also misleading.
The real two hard things (architecturally) are:
If you get those right, invalidation becomes a manageable engineering problem.
If you get those wrong, your cache becomes a bug amplifier.
If your policy is implicit, your bugs are implicit too.
Before Redis or CDNs, you already have caching behavior:
Browser cache
Great for static assets and safe GETs. Dangerous when personalized data isn’t keyed correctly.
CDN / edge cache
Eliminates geographic latency. Requires explicit cache-control and correct variation (“Vary” is your friend).
App / server memory
Fastest cache. Least shareable. Resets on deploy. Great for small hot sets and computed config.
Distributed cache (Redis, Memcached)
Shared across instances. Powerful. Also the easiest place to accidentally store “truth” with no guardrails.
A useful architectural stance:
Treat caching as a stack of contracts, not a single technology.

Databases store truth (as best as we can make it).
Caches store answers we’re willing to re-derive.
That sounds obvious, but teams constantly violate it by accident.
Here’s the simple rule:
That’s why these usually are safe to cache:
And these are usually not safe to cache without extra design:
Most cache incidents are not “Redis failed.”
They’re key design failures.
A good cache key includes every dimension that changes the answer.
Include the things that change data visibility:
Include the things that change how the response is shaped:
Include the things that change what “fresh” means:
If you can’t prove the cached value can be shared across users, don’t share it. Prefer per-user keys until proven otherwise.
If you do nothing, browsers and CDNs will still make decisions.
If you want reliability, you need to own the semantics.
Static assets (JS/CSS/images built assets)
HTML / API responses
CDNs are incredible when you can keep the rules simple:
The more dimensions your response varies on (cookies, headers, locale), the less effective shared caching becomes.
That’s not a failure — it’s reality.
Your architecture choice is:
Teams love “just purge on update.”
It works… until it doesn’t.
Because:
Preferred pattern: use immutable URLs for assets and bounded TTLs for content, and treat purging as an optimization, not correctness.
Redis is not one thing — it’s a toolbox.
Here are the patterns that matter for system design:
Cache-aside (lazy loading)
App checks cache → on miss fetch DB → populate cache. Simple. Miss storms are your main risk.
Read-through
Cache layer fetches on miss. Centralizes policy but can hide complexity if you’re not careful.
Write-through
Write goes to cache + DB together. Good for read-heavy keys. Adds write latency and coupling.
Write-behind (dangerous)
Writes land in cache and flush later. Great for throughput; terrifying for correctness. Use only with explicit invariants.
My bias in 2021 production systems:
The cache expires.
A thousand requests arrive.
All of them miss.
All of them hammer the DB.
Congratulations: your cache caused your outage.
Defenses that actually work:
Not security-poisoning (though that exists too).
I mean “we cached an answer under the wrong key” — and now everyone sees it.
Root causes:
Defenses:
TTL is a bounded lie.
Sometimes that’s exactly what you want.
But TTL doesn’t answer:
So treat TTL as one knob, not “the strategy.”
Cache invalidation becomes manageable when you do one of these:
Key includes a version:
user:123:profile:v42product:sku123:details:updatedAt:2021-10-12T10:03ZWhen the entity changes, the key changes. No purge required. Old keys die by TTL.
When a write succeeds, publish an event (or enqueue a job) that invalidates affected keys.
This is operationally harder than it sounds:
Use TTLs that bound harm and revalidate with conditional GET / background refresh. Correctness is achieved by bounded staleness, not perfect invalidation.
The goal is not the fanciest cache.
The goal is predictable behavior under load.
Here’s the stack I trust in most product systems:
Static assets
Content-hashed filenames + long-lived browser/CDN caching.
Public GET endpoints
CDN caching with conservative TTLs + “Vary” only where necessary.
Personalized data
Private caching at the browser (sometimes) and per-user caching at Redis (when safe).
Hot computed views
Redis cache-aside + stampede control + short TTLs + metrics.
This is “boring” because it avoids cleverness that’s hard to operate.
You cannot operate caching by vibes.
You need metrics that tell you when it’s helping and when it’s hurting.
Write it down:
List dimensions:
Sometimes, but be explicit about ownership.
Caching “rows” often turns Redis into a shadow database with different consistency rules. Prefer caching responses or computed views that are safe to recompute, and use versioned keys or bounded TTLs to avoid “hidden truth” living in cache.
Yes — for public, idempotent GETs with explicit cache-control rules and correct variation (“Vary”).
For authenticated or personalized APIs, treat CDN caching as opt-in: use “private” semantics or avoid shared caching unless you can prove it’s safe.
Content-hashed static assets + a CDN.
It’s the biggest performance gain with the lowest correctness risk, and it reduces load everywhere (browser, edge, backend).
Because you’ve introduced state that is:
This is why cache observability and bounded staleness are not “nice to have.” They make the system debuggable.
Caching is the performance layer.
Now we move into the reliability layer — where asynchrony becomes unavoidable:
Next month:
Queues, Retries, and Idempotency: Engineering Reality in Async Systems
Because the moment you add retries, caching’s cousin shows up:
The system will do things more than once — unless you design it not to.
Queues, Retries, and Idempotency: Engineering Reality in Async Systems
Async work is where production gets honest. This month is a practical playbook for queues, retries, idempotency keys, and the patterns that keep “background jobs” from duplicating money or burning trust.
Data Stores 101 for Architects: SQL, NoSQL, and the Shape of Consistency
Stop choosing databases by brand. Choose them by invariants, access patterns, and what “correct” means when the network is on fire.