Feb 27, 2022 - 14 MIN READ

CI/CD as Architecture: Testing Pyramids, Pipelines, and Rollout Safety

CI/CD isn’t a DevOps checkbox — it’s the architecture that makes change safe. This month is about test economics, pipeline design, and rollout strategies that turn “deploy” into a reversible decision.

Axel Domingues

Most teams treat CI/CD as plumbing.

Something you wire once, complain about forever, and blame when production burns.

But in mature systems, CI/CD is something else entirely:

It’s the architecture of change.

It determines:

how fast you can ship without gambling
how often “green” still breaks prod
whether a bad deploy is an incident… or a non-event
whether teams can move independently without stepping on each other

This month is about turning CI/CD into a safety system — with crisp mental models you can teach, review, and enforce.

This is the start of the 2022 arc: operational architecture.

Not “how to use a tool.”
How to design a system that remains safe while it changes.

The goal this month

Make change safe by design: tests that pay rent, pipelines that produce trust, and deployments that can be reversed.

The mindset shift

CI/CD is not automation.

It’s the contract between code and production.

The real output

Not “passing builds.”

A system where every deploy is boring.

The failure you’re avoiding

“Green pipeline, broken prod” — and no safe way back.

CI/CD Is an Architecture Layer (Whether You Admit It or Not)

If you design your runtime architecture carefully but ignore your delivery architecture, you end up with a paradox:

your system is “well-designed”…
but change is chaos

The delivery architecture answers questions like:

What is the deployable unit? (service, module, monolith, frontend bundle)
What is an artifact? (container image, package, immutable bundle)
How do we promote it? (dev → staging → prod)
What can block a release? (tests, policy checks, SLO burn, approvals)
How do we roll back safely? (revert, roll back, disable features, drain traffic)
What’s the blast radius? (one service, one region, one tenant, everyone)

If these aren’t explicit, they still exist — they’re just encoded as tribal knowledge and late-night heroics.

A useful definition:

CI/CD is the mechanism that converts change into a controlled experiment.

If you can’t control the experiment, you don’t have CI/CD — you have a build server.

The Testing Pyramid Is an Economic Model (Not a Diagram)

People quote the testing pyramid like a rule.

It’s better understood as a budget.

Every test has a cost profile:

Runtime cost (minutes in pipeline, compute, parallelization limits)
Maintenance cost (flake rate, brittleness, data setup burden)
Debug cost (how fast you get to root cause)
Coverage value (what kinds of bugs it can actually catch)
Confidence value (how much it should block a release)

When you treat tests as economics, the pyramid becomes obvious:

Unit tests

Cheap, fast, stable. Great at logic + edge cases. Poor at integration truth.

Integration tests

Medium cost. Catch contract issues and wiring bugs. Must be curated to avoid slow creep.

End-to-end tests

Expensive and fragile. Validate critical paths only. Treat as “smoke alarms,” not a net.

Production validation

The only environment with real traffic. Requires safe rollout + observability to be useful.

A common anti-pattern:

E2E as the primary safety net.

It produces slow pipelines, flaky builds, and low trust.
If your only confidence is “run the whole system,” you’ve already lost speed and safety.

The Pipeline Is a State Machine That Produces Trust

A good pipeline is not “steps.”

It’s a state machine with two outputs:

a promoted artifact
a credibility score (implicit): Do we trust this enough to release?

That credibility score comes from layers of evidence.

The core pipeline states

Build once, produce an immutable artifact

build from source → produce a single artifact (e.g., container image)
stamp it (version, commit SHA)
push to an artifact registry
never rebuild the same version differently

Verify the artifact, not the repo

run unit tests + static checks during CI
run integration checks against the built artifact
treat “it passes locally” as irrelevant

Promote through environments, don’t “redeploy from source”

dev → staging → prod should be promotion
the artifact is identical; only config changes
environment differences are exposed early

Enforce policy as code

dependency scanning, SBOM, licenses, secrets detection
infrastructure checks (terraform plan, policy compliance)
drift detection and change approvals where needed

Gate prod on signals that matter

critical tests
required approvals for risky changes
release windows (if you must)
SLO burn alerts (if you’re already degrading, don’t deploy)

A subtle but crucial rule:

A pipeline should fail fast on problems that won’t fix themselves.
And it should retry automatically on problems that might.

Flaky tests are not “just annoying.”
They are a credibility leak.

“Green But Broken” Usually Means You’re Measuring the Wrong Things

When a pipeline is green but prod fails, it’s rarely mysterious.

It’s typically one of these:

Rollout Safety: Deployments as Reversible Decisions

A deployment strategy is the thing that decides whether prod is a cliff.

When you say “rollout safety,” you’re really asking:

Can we limit blast radius?
Can we observe correctness quickly?
Can we reverse without heroics?

The practical rollout toolbox

Blue/Green

Two environments. Flip traffic. Fast rollback. Requires careful handling of data migrations.

Canary / progressive delivery

Route a small % of traffic to the new version. Roll forward or back based on signals.

Feature flags

Separate deploy from release. Ship code dark, enable gradually, kill quickly when needed.

Ring deployments

Promote by cohort: internal users → beta → small region → full fleet. Great for large orgs.

The “two hard things” from the caching article still apply here:

naming things (what is a release? what is a version? what is an environment?)
invalidating things (what does rollback actually mean when data changed?)

Rollout safety is where teams learn that “undo” is an architecture property.

The Rollback Truth: Not Everything Rolls Back

Teams say “we can roll back” and then discover the trap:

the binary rolls back
the data does not

So rollout safety depends on migration discipline and compatibility discipline.

The minimum viable migration discipline

Use expand/contract migrations:
- expand schema in a backward-compatible way
- deploy code that can read both forms
- migrate data
- contract later (remove old columns/paths)
Never ship a change where:
- new code requires new schema immediately and
- old code cannot run on new schema

If you want fewer outages, make this a review rule:

A deploy must be safe in both directions for at least one release window.

That single constraint forces architecture maturity.

Architecture Checklists

These are the checklists I want teams to print and argue about.

Pipeline design checklist

Build once → produce immutable artifact (image/package)
Artifact promotion (not rebuilding) across environments
Secrets handled via runtime injection (not baked into artifacts)
Parallelization strategy (fast feedback for main branch)
Policy as code (security + compliance gates where needed)
Clear ownership for pipeline reliability (it’s a product)

Test portfolio checklist

Unit tests cover domain logic and edge cases
Integration tests cover contracts and critical dependency wiring
E2E tests are limited to true business-critical flows
Flake budget is enforced (flake is treated as a defect)
Tests produce actionable failure output (fast diagnosis)

Release and rollout checklist

Deployment strategy chosen intentionally (blue/green, canary, ring)
Rollback plan is explicit and tested
Compatibility policy for API/schema changes
Feature flags for high-risk or UI/behavior changes
Observable success criteria for canary (not vibes)

Common Anti-Patterns That Kill Delivery

Resources

Google SRE Book — Release Engineering

A classic explanation of why release processes are reliability mechanisms — and why automation is a means, not the goal.

DORA / Accelerate — Metrics That Matter

The most useful vocabulary for delivery performance: lead time, deployment frequency, change fail rate, and time to restore.

Martin Fowler — Feature Toggles

A practical pattern for decoupling deployment from release — and for making rollback possible when data can’t rewind.

Argo Rollouts (progressive delivery patterns)

An approachable way to operationalize canary/blue-green and tie rollout decisions to metrics and analysis.

FAQ

What’s Next

If CI/CD is the architecture of change, then observability is the architecture of truth.

Next month:

Observability that Works

Because safe rollouts only work if you can see reality quickly — and decide based on signals, not hope.

Observability that Works: Logs, Metrics, Traces, and SLO Thinking

Observability isn’t “add dashboards.” It’s designing feedback loops you can trust: signals that answer real questions, alerts tied to user pain, and tooling that helps you debug under pressure.

Containers, Docker, and the Discipline of Reproducibility

Containers aren’t “how you deploy apps” — they’re how you make environments stop being a variable. This is the operational discipline: immutable artifacts, repeatable builds, and runtime contracts you can actually rely on.