
Most “data problems” are actually truth problems. This month is a practical mental model for product teams: where truth lives, how it moves, when to stream, when to batch, and how to keep analytics useful without corrupting production.
Axel Domingues
Most product teams don’t need “a data platform.”
They need answers that don’t lie.
And the uncomfortable reality is this:
If you can’t agree on what is true, you will never agree on what to build.
Because every dashboard, experiment, alert, and ML feature is a vote about truth:
September is about a mental model that scales past “throw it in BigQuery/Snowflake and pray.”
OLTP vs OLAP is the starting point.
But the real point is: truth.
The goal this month
Give product teams a crisp model for operational vs analytical truth, and how data moves between them without breaking.
The punchline
Most teams don’t have a data problem.
They have a truth boundary problem.
The tools
OLTP, OLAP, event streams, CDC, batch, metrics layers — treated as tradeoffs, not ideology.
The deliverable
A practical playbook: decision rules, checklists, and the “don’t corrupt production truth” contract.
Before OLTP vs OLAP, separate truth domains. This is the root of most data pain.
Operational truth
What the product committed to users: orders, balances, access, permissions. Must be correct and auditable.
Analytical truth
What the business believes happened after processing: metrics, cohorts, revenue, retention. Must be consistent and explainable.
Observational truth
What we observed about the system: logs, traces, events, clickstreams. Noisy but invaluable for debugging and behavior.
Experimental truth
What happened under a treatment: exposure logs, assignment, attribution. Easy to poison if definitions drift.
If your analytics pipeline can change past “revenue” in a way finance can’t reconcile, you don’t have a pipeline — you have a story generator.
So the real question becomes:
How do we move operational truth into analytical truth without smearing it?
That’s where OLTP vs OLAP and streaming vs batch finally make sense.
People talk about OLTP/OLAP like a database debate.
It’s not.
It’s a workload + correctness contract debate.
OLTP systems optimize for:
If OLTP lies, users feel it immediately.
OLAP systems optimize for:
If OLAP lies, the company makes the wrong decisions — and you discover it months later.
The simplest rule:
OLTP is for decisions that affect users now.
OLAP is for decisions that affect the business later.
And then the hard part:
How does data get from one world to the other?
Most teams treat “streaming” like a status symbol.
In practice, you’re choosing between three movement primitives — and each has a cost.

Batch
Cheap and reliable. Great for truth and finance. Bad for “real-time” expectations.
Streaming events
Great for reactivity and pipelines. Harder correctness. Requires schema discipline and backpressure design.
CDC
Copies changes from OLTP. Powerful, but you must understand what you’re capturing: writes, not meaning.
Reality check
You often need two: streaming for immediacy, batch for reconciliation.
Batch wins when:
Batch is how finance stays sane.
Streaming wins when:
Streaming is how operations stay responsive.
CDC wins when:
CDC is how replication becomes practical — if you treat it like a sharp tool.
Production systems need a single place where the truth is committed.
That place is typically your OLTP database — not because it’s cool, but because it is the only thing allowed to say:
“This order exists.”
“This payment was captured.”
“This user is authorized.”
Analytics and pipelines must observe and derive, but they must not mutate the authoritative record.
You have a second product team editing history.
So instead of “writing analytics into production,” we do the opposite:
That leads to the next core concept: events vs state.
If you only store state (“current status = shipped”), you lose history unless you build history explicitly.
If you store events (“order shipped at t=...”), you can rebuild state and also answer forensic questions.
In practice, you usually need both — but you must understand which is primary in each domain.
State is for serving
Fast reads, current truth, simple queries. But weak forensic capability without history.
Events are for explaining
You can replay, rebuild, audit, and debug. But you must design ordering, idempotency, and schemas.
Publish business events when the change matters to other systems or analysis:
Don’t publish “table_updated” — publish meaning.
You rarely get global ordering. You get per-entity ordering at best.
Design around:
This month pairs with May’s distributed data reality:
Once data is distributed, correctness is a protocol.
Now we apply that to analytics.
Here’s the smallest architecture that doesn’t lie (and doesn’t melt production).

Pick the authoritative system for each business concept:
Write it down. Make it boring. Make it explicit.
Choose one:
In many teams: events + CDC is the “grown-up” combination.
Create a landing zone (raw tables / raw topics):
Build curated tables (clean, typed, consistent):
Define metrics once, not 400 times:
Data quality checks are not optional:
If you stop here, you already beat most companies.
Because you now have a process that preserves truth boundaries.
Teams say “we need real-time” when they mean one of these:
Fast feedback
Dashboards within minutes. Usually satisfied by micro-batch.
Operational action
Alerts, fraud detection, automation. Needs streaming + reliability.
Product personalization
Recommendations, ranking, adaptive UX. Needs low-latency features + careful correctness.
Experiment readouts
Near-real-time experiment monitoring. Needs exposure logs + attribution discipline.
Micro-batch (every 1–5 minutes) solves “fast feedback” with far less pain than “true streaming.”
Reserve streaming for cases where the business decision actually needs it.
Every org eventually ends up with 17 definitions of “active user.”
That’s not a tooling failure — it’s a governance failure.
The metrics layer is the defense line that stops “analytics entropy.”
Name and intent
What question it answers. Not just a SQL snippet.
Eligibility rules
Which users/orders count — and why.
Time semantics
Event time vs ingestion time, windows, timezones, late data policy.
Ownership
Someone is accountable for changes. Metrics are products.
Revenue must reconcile with a ledger, and changes must have a paper trail.
A healthy org treats metrics like APIs:
Which brings us to a topic that looks “data” but is actually “distributed systems”:
schema evolution.
In June we talked about API evolution with consumer-driven contracts.
The same thing applies to data:
So you need compatibility rules.
Adopt a policy like:
Streaming is not “faster ETL.” It’s a different correctness surface area.
Ordering
You rarely get global order. Design per-entity ordering and tolerate disorder.
Duplicates
“At least once” delivery means duplicates. Idempotency isn’t optional.
Late data
Event time ≠ ingestion time. Decide how long you allow corrections.
Backfills
If you can’t replay, you can’t recover. Plan reprocessing as a feature.
Exactly-once myths
Exactly-once is expensive and contextual. Prefer “effectively once” with dedupe.
Operational load
Consumers lag, partitions skew, and backpressure happens. Streaming is a system to operate.
If that list feels heavy, that’s the point.
Streaming buys time-to-signal. It costs you operational complexity.
Use it like a scalpel.
Let’s make it real with a common pipeline: e-commerce.
Operational truth: payments and refunds live in a ledger-like model.
Analytical truth: finance wants revenue by day, channel, and cohort.
Observational truth: product wants conversion funnels and drop-offs.
A healthy architecture:
But you move fast with a protocol: events → curated models → versioned metrics → reconciled truth.
When deciding OLTP vs OLAP vs streaming vs batch, don’t start with tools.
Start with questions.
If the answer affects a user immediately, it belongs in OLTP.
Examples:
Aggregations across time belong in OLAP.
Examples:
Streaming is justified when the business decision cannot wait for batch:
If it’s just dashboards, micro-batch is usually enough.
CDC is best when:
But CDC sees writes, not meaning. Use business events to express intent.
This is the “walk into a new system and stabilize it” checklist.
For each critical business concept:
If you do these seven steps, you’ve built the foundation for:
The Data Warehouse Toolkit
Dimensional modeling is still one of the best mental models for building curated analytical truth that stays usable.
Change Data Capture (CDC)
A quick overview of the concept: CDC is replication of changes, which is powerful — but captures writes, not meaning.
Most product teams should start with: warehouse + disciplined curated models + a metrics layer.
Lakehouse architecture can be great, but it doesn’t solve truth boundaries by itself. If your definitions drift, your lakehouse just stores drift faster.
No. Model meaning as events where it matters, and keep serving state where it’s useful.
A practical rule:
Treat metrics like APIs:
If your org can’t do that socially, no tool will save you.
Querying OLTP directly for analytics until production becomes fragile.
It works… until it doesn’t. Build an extraction path early, even if it’s simple batch snapshots.
This month was about truth boundaries:
Next month is the part architects avoid until the bill arrives:
Cost as a First-Class Constraint
Because systems that scale don’t just break on correctness and latency.
They break on invoices.
Cost as a First-Class Constraint: FinOps for Architects
Reliability is non-negotiable, but “cost” is where architecture meets physics. This month is a practical playbook: how to model cost, allocate it, and design guardrails so your system scales without surprising invoices.
Cloud Infrastructure Without the Fanaticism: IaaS, PaaS, Serverless, Kubernetes
A practical mental model for choosing cloud primitives without ideology—based on responsibility boundaries, scaling, reliability, cost, and team operating capacity.