Aug 28, 2022 - 16 MIN READ

Cloud Infrastructure Without the Fanaticism: IaaS, PaaS, Serverless, Kubernetes

A practical mental model for choosing cloud primitives without ideology—based on responsibility boundaries, scaling, reliability, cost, and team operating capacity.

Axel Domingues

There’s a phase every team hits where “cloud choice” becomes a personality trait.

“We’re Kubernetes people.”
“Serverless is the future.”
“Just give me VMs; everything else is hype.”
“Managed PaaS or nothing.”

None of these are wrong.

What’s wrong is treating them as identity.

Because cloud infrastructure isn’t ideology — it’s a responsibility trade:

What parts of the system do you want to own at 3 a.m.?

This post is a practical mental model for choosing between:

IaaS (VMs + networks + disks)
PaaS (managed runtimes / app platforms)
Serverless (FaaS + managed workflows)
Kubernetes (a platform you build and operate)

Not by vibes — by constraints.

What you’re really choosing

A responsibility boundary — what you own vs what the provider owns.

The “adult” question

When incidents happen, do you have the operational capacity to handle what you chose?

The clean mental model: “Compute is cheap, operations are not”

Every cloud option is a different answer to the same question:

How much platform work do you want to build and maintain?

The common trap is evaluating options as features. The correct evaluation is as ongoing obligations:

patching, upgrades, CVEs
scaling behavior
deployment safety
observability hooks
networking complexity
identity and access boundaries
cost predictability
incident blast radius
change management / governance

The more you don’t want to do those things, the more you want managed services.

The more you must control those things, the more you accept ownership.

That’s it. Everything else is commentary.

The four models (and what you actually buy)

Let’s stop using buzzwords and instead describe the contracts.

IaaS

You rent machines (virtual or bare metal) and build everything else.

PaaS

You deploy an app; the platform handles runtime, scaling, patching, routing.

Serverless

You run functions + managed workflows; you trade control for elasticity.

Kubernetes

You build your own platform on top of containers (and then operate it).

A more honest way to compare:

IaaS (VMs)

You own:

OS patching and hardening
runtime installation and upgrades
capacity planning (or at least autoscaling policies)
process supervision, service discovery, load balancing
backups, failover wiring, DR posture
all the “platform glue”

You get:

maximum control
familiar debugging (SSH is a hell of a drug)
predictable performance characteristics
fewer abstraction surprises

PaaS (managed app platforms)

You own:

the app and its build
scaling settings and constraints
config and secrets practices
runtime-level metrics/logs integration
your dependencies (DB, cache, queues)

The provider owns:

most patching and runtime lifecycle
routing/load balancing
horizontal scaling primitives
basic operational scaffolding

You get:

speed and safety for the “common case”
fewer moving parts
a platform that resists footguns

Serverless (FaaS + managed workflows)

You own:

function code, packaging, dependency risk
event triggers, retry semantics, idempotency
timeouts, memory limits, cold start constraints
observability stitching (especially across async boundaries)

The provider owns:

fleet management
scaling to zero and burst scaling
most of the runtime and infra

You get:

elasticity without capacity planning
very fast iteration for event-driven systems
strong alignment with async workloads

Kubernetes (containers + orchestration)

You own:

the cluster lifecycle (or you pay to partially offload it)
networking, ingress, service mesh (if you choose)
deployment tooling, policies, and governance
security posture, RBAC, admission control
platform SRE workload, on-call reality

You get:

a consistent deployment abstraction
portability (sometimes)
more control than PaaS, more structure than IaaS
a foundation for multi-team scale (if you operate it well)

Kubernetes is not “just a runtime.” It’s an organizational choice: you are committing to building and owning a platform.

The responsibility spectrum (the diagram I wish teams used)

Instead of arguing about tools, map where the work goes.

Responsibility spectrum from IaaS to PaaS to Serverless to Kubernetes

A quick heuristic:

If your team can’t comfortably operate a distributed system, don’t volunteer to operate the platform too.
If your compliance, networking, or runtime constraints demand control, don’t pretend PaaS abstractions will magically fit.

Decision axes that matter in real life

Most “cloud comparisons” skip the only questions that consistently decide outcomes.

Axis 1: Team operating capacity (the underrated one)

Ask:

Do we have SRE/platform engineers, or are we a product team?
Is on-call already heavy?
Do we have time for platform upgrades, cluster security, and incident drills?
Do we have reliable automation and testing for infrastructure changes?

If the answer is “no”, you need more managed services, not fewer.

Axis 2: Change velocity vs change safety

PaaS and serverless tend to enforce safer defaults.
Kubernetes gives you more knobs — and therefore more ways to break things.
IaaS gives you complete freedom — including freedom to create bespoke, fragile snowflakes.

Axis 3: Workload shape

Spiky, event-driven, async: serverless wins (if latency constraints permit).
Steady, long-running services: PaaS or Kubernetes.
Special runtimes, stateful oddities: IaaS or Kubernetes (sometimes PaaS if supported).

Axis 4: Latency and tail behavior

Cold starts and per-request isolation can hurt p95/p99 latency in serverless.
Kubernetes and IaaS let you reserve capacity and tune performance.
PaaS can be excellent here, depending on the platform and configuration.

Axis 5: Compliance / isolation / network control

Some orgs need VPC controls, custom networking, strict egress policies, dedicated hardware, or specific OS baselines.
That can push you toward Kubernetes or IaaS — but don’t let “compliance” become a blanket excuse for “DIY everything.”

Axis 6: Cost predictability

Serverless can be very cheap at low traffic and very expensive at scale depending on usage pattern.
Kubernetes can be cost-efficient at scale if you run it well; it can also be a cost leak machine.
PaaS often trades a premium for simplicity.
IaaS costs can look cheap until you price in people-time and operational risk.

If you’re choosing “the cheapest compute,” you’re optimizing the wrong line item.
The biggest cloud cost is usually: engineering time + risk.

A decision matrix you can actually use

This isn’t a scorecard — it’s a prompt to think clearly.

The “platform stack” reality: most good systems are hybrids

The healthiest architectures I’ve seen are not “one choice.” They’re a layered blend:

PaaS for the mainstream stateless services
Serverless for event handling, cron, glue, low-frequency tasks
Managed databases, caches, and queues (almost always)
Kubernetes only when it solves real multi-team scale problems
IaaS for the few weird snowflakes you can’t avoid (and you keep them contained)

The pragmatic goal

Use the most managed option that still meets your constraints.
Only “graduate” to more control when you can prove the constraints.

Patterns that win in practice

Pattern 1: “PaaS core, serverless edges”

Put the product API and core services on PaaS (boring, scalable, observable).
Use serverless for:
- webhooks
- cron-driven workflows
- file/queue processing
- integration glue
- scheduled batch jobs
Keep shared concerns consistent: auth, logging, tracing, config, secrets.

This gives you a simple operational center with elastic edges.

Pattern 2: “Kubernetes as a product, not a cluster”

If you do Kubernetes, treat it like internal infrastructure with:

a platform roadmap
golden paths (templates, paved roads)
opinionated defaults
security policy as code
cluster upgrade automation
service onboarding that is faster than DIY

If your Kubernetes experience is “every team invents its own ingress and its own Helm chart,” you’re not gaining control — you’re multiplying chaos.

Pattern 3: “Serverless workflows, not serverless spaghetti”

Serverless becomes a mess when:

every function triggers 3 other functions
retries are implicit and invisible
state is scattered across logs

The fix is to introduce explicit orchestration:

workflow engines (managed where possible)
explicit state machines
explicit idempotency keys
event schemas and versioning
trace correlation IDs from the start

Pattern 4: “IaaS as containment for the weird stuff”

If you must run weird workloads on VMs:

isolate them (network boundaries, blast radius)
put a strong deployment discipline around them (immutable images, rollouts)
keep them out of your main development path
measure and plan their retirement

The fastest way to infect an organization is letting “one special case” define the entire platform.

The cost model you should use (and the one teams forget)

There are two cost models:

Bill cost (cloud invoice)
Operating cost (people + risk + time + incidents)

Teams obsess over #1 and quietly drown in #2.

Here’s the reality:

Kubernetes can lower bill cost if you run high utilization and standardize workloads.
Kubernetes can also increase operating cost massively if you don’t have platform maturity.
PaaS often increases bill cost but reduces operating cost dramatically.
Serverless can be almost free or shockingly expensive depending on traffic shape and execution time.

Decision rule: Choose the option that minimizes total cost of ownership for your team size and maturity — not the one that wins a benchmark.

Reliability posture by model (where incidents come from)

You don’t get reliability “for free” anywhere — you just shift where failure happens.

IaaS failures look like

OS patching gaps, snowflake config drift, manual deployments, fragile failover.

PaaS failures look like

Platform limits, noisy neighbors, deployment misconfig, opaque platform incidents.

Serverless failures look like

Retry storms, idempotency bugs, cold starts, event version mismatches.

Kubernetes failures look like

Cluster upgrades, networking/ingress policy mistakes, runaway resource usage, platform drift.

The key is to build guardrails where your chosen model tends to fail.

A “boring but correct” checklist before you pick

Define your non-negotiables

latency SLOs (p95/p99)
compliance constraints (network isolation, data residency)
runtime constraints (languages, system deps)
expected traffic shape (steady vs spiky)
team operating capacity (on-call maturity, platform skills)

Establish your baseline building blocks

No matter what you pick, you need:

CI/CD with safe rollouts and fast rollback
secrets management
centralized logs + metrics + traces (and correlation IDs)
a clear data strategy (managed DB, backups, DR)
a queue/event backbone for async work
cost visibility (tags/labels, budgets, alerts)

Choose the simplest platform that satisfies constraints

Start managed and only add control when you can justify it.

Make the decision reversible (as much as possible)

package your app cleanly (12-factor-ish)
keep dependencies explicit
avoid coupling business logic to platform-specific semantics without benefit
version your events and APIs
treat infrastructure as code

“Graduation paths” that don’t destroy your team

The worst path is a big-bang platform migration. The best path is incremental, measured, and reversible.

Path A: Start PaaS → adopt serverless for edges → consider Kubernetes later

This is the most common successful path for product companies.

Path B: Start IaaS → enforce immutable images + pipelines → move to PaaS/Kubernetes

This is common in enterprises migrating legacy systems.

Path C: Start serverless → introduce workflows + contracts → keep a core service layer stable

This works well for event-heavy integration products — but you must invest early in correctness (idempotency, contracts).

Kubernetes is often a destination after you’ve proven platform needs — not a starting point for a small team.

Anti-patterns (the ones I keep seeing)

A reference “cloud choice” decision log template

If you want this to become architecture (not opinion), write it down.

Use this as your decision record:

Context: what are we building, what’s our team size, what’s our on-call maturity?
Constraints: latency, compliance, runtime, traffic shape
Options considered: IaaS / PaaS / serverless / Kubernetes
Decision: what we choose now
Why: mapped to constraints and operating capacity
Tradeoffs accepted: what pain we accept
Exit strategy: what would force us to change this decision?
Guardrails: what policies and automation we put in place

Architecture is the ability to explain your decisions

If you can’t explain it in a page, you probably chose it as identity.

Resources (worth bookmarking)

AWS Well-Architected Framework

A practical set of questions and tradeoffs across reliability, security, cost, operations, and performance.

Google SRE Book

The canonical “operate like an adult” reference: SLOs, error budgets, incident response, and capacity thinking.

CNCF Cloud Native Landscape

A map of the ecosystem — useful for discovering categories, dangerous for shopping indiscriminately.

Twelve-Factor App

Still relevant: build and run disciplines that make migrations and operations easier.

What’s Next

This month was about choosing infrastructure without ideology — by understanding responsibility boundaries and operational capacity.

Next month the series shifts from “where do we run it?” to “what data are we producing?”

Data Engineering for Product Teams

Because if you mix operational truth with analytics convenience, you will eventually ship lies — and then make decisions based on them.

Data Engineering for Product Teams: OLTP vs OLAP, Streaming, and Truth

Most “data problems” are actually truth problems. This month is a practical mental model for product teams: where truth lives, how it moves, when to stream, when to batch, and how to keep analytics useful without corrupting production.

Performance Engineering End-to-End: From TTFB to Tail Latency

Performance isn’t a tuning phase — it’s an architecture property. This month I lay out an end-to-end mental model (browser → edge → app → data) and the practical playbook for improving both “fast on average” and “fast under load” without shipping fragile optimizations.