Sep 25, 2016 - 17 MIN READ

Exercise 5 - Debugging ML (Bias/Variance, Learning Curves, and What to Try Next)

The most practical assignment so far - diagnose bias vs variance using learning curves, tune lambda with validation curves, and build a repeatable “next action” playbook.

Axel Domingues

Up to Exercise 4, my workflow was mostly:

implement the algorithm
train it
celebrate when accuracy goes up

Exercise 5 changes the question.

Instead of “can I train a model?”, the question becomes:

My model isn’t performing. What do I try next?

This is the assignment that feels the most like real engineering:

you train a model
results look bad
you diagnose the failure mode
you apply the right fix (not random tweaks)

The core idea is bias vs variance, and the main tool is learning curves.

The question this exercise answers

“My model performs badly — what should I try next?”

The diagnostic tool

Learning curves: training vs validation error as data grows.

The decision framework

Bias vs variance → apply the right fix instead of random tweaks.

What’s inside the Exercise 5 bundle

Why this exercise matters

In production, the painful part of ML is rarely “writing the model.”

The painful part is:

why is it underperforming?
am I underfitting or overfitting?
should I add features, add data, or tune regularization?

Exercise 5 gives you a structured answer.

The dataset (small on purpose)

The first dataset is intentionally tiny.

That’s not a mistake — it’s the point.

Small datasets exaggerate failure modes.

That’s exactly why they’re useful here — they make underfitting vs overfitting visually and numerically obvious.

A small dataset makes it easier to see:

what underfitting looks like
what overfitting looks like
how adding more data changes the outcome

You start with a simple relationship:

input: water level
output: water flowing out

This is “regression,” but the focus is debugging behavior, not the domain.

Step 1 — Regularized linear regression cost + gradient

You implement linearRegCostFunction.m.

It returns:

J: squared error cost (plus regularization when lambda > 0)
grad: gradient vector (plus regularization for weights)

Key rule (again):

do not regularize the bias term (theta(1))

If you regularize the bias term (theta(1)), every diagnosis after this becomes unreliable.

This is a silent bug: nothing crashes, results just look “off”.

Implementation pattern:

function [J, grad] = linearRegCostFunction(X, y, theta, lambda)
  m = length(y);

  h = X * theta;

  theta_reg = theta;
  theta_reg(1) = 0;

  J = (1/(2*m)) * sum((h - y) .^ 2) + (lambda/(2*m)) * sum(theta_reg .^ 2);

  grad = (1/m) * (X' * (h - y)) + (lambda/m) * theta_reg;
end

If your gradients are wrong, everything after this becomes noise. Validate this function before moving on.

Step 2 — Train with trainLinearReg.m

This helper wraps optimization so the script can repeatedly train models for:

different training sizes (learning curves)
different lambda values (validation curve)

The function is simple: it calls an optimizer with your cost function.

This is an important pattern:

cost + gradients live in one place
training becomes a reusable “tool”

Step 3 — Learning curves: the best debugging graph in this course

Learning curves plot two errors as you increase the number of training examples m:

training error
cross-validation error

The goal is not pretty graphs.

The goal is diagnosis.

How learning curves are computed (important detail)

When building learning curves in this assignment:

you train using a chosen lambda (often lambda = 0 for the curve)
but you compute training and validation errors with lambda set to 0

Why?

Because we want to measure pure fit error, not the extra penalty term.

This detail matters or your curves become misleading.

Training uses regularization to prevent crazy parameters. But error reporting uses lambda = 0 so you’re comparing real prediction error.

Learning curves don’t tell you how to fix the model.They tell you which category of fix makes sense:

model too simple?
model too flexible?
more data likely to help or not?

Bias vs variance (the simple decision table)

Step 4 — Polynomial regression (a controlled way to add power)

Next, the assignment has you build polynomial features.

This is a practical move:

linear regression can underfit if the relationship is curved
polynomial features make the model more expressive

But as soon as you add polynomial features, you create a new risk:

overfitting

So the exercise makes you do the full workflow:

map features
normalize features
train
visualize fit
diagnose

Feature normalization is not optional here

Polynomial features explode in magnitude:

x, x^2, x^3, …

Without normalization:

optimization can become unstable
parameters become hard to compare
learning curves become harder to interpret

Feature normalization is not optional once you add polynomial features.

Skipping it leads to unstable optimization and misleading learning curves.

The provided featureNormalize.m is part of the “real workflow” habit.

Step 5 — Validation curve: choosing lambda like an engineer

Once you have polynomial features, lambda becomes the main knob.

The correct way to pick lambda is:

choose a set of candidate values
train a model for each
evaluate on validation set
select the lambda with lowest validation error

Typical grid:

lambda in {0, 0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1, 3, 10}

This assignment formalizes that into validationCurve.m.

Do not use the test set to choose lambda. The test set is the final exam.

The “What to Try Next” playbook (the actual value)

This is the practical output of Exercise 5.

When performance is bad, I now ask:

1) Is this high bias or high variance?

Use learning curves.

2) If high bias, try:

add more features (polynomial, interactions)
reduce lambda
choose a more expressive model

3) If high variance, try:

increase lambda
reduce feature mapping
add more training data

4) If both are bad, try:

better features (domain-informed)
check data quality (noise, mislabeled examples)
confirm splits (training/validation/test)

In real projects, “get more data” isn’t always possible. Regularization and better features are often the first real levers.

Steps I followed (the workflow I want to repeat)

Implement linearRegCostFunction

Validate cost and gradient shapes and ensure bias is not regularized.

Train a baseline linear model

Plot the fit and get an initial sense of underfit/overfit.

Generate learning curves

Compute training and validation error as the training set grows.

Add polynomial features + normalization

Increase model power in a controlled way.

Tune lambda using a validation curve

Select lambda based on validation error, not intuition.

Re-check learning curves with the tuned lambda

Confirm the diagnosis improves (gap shrinks for variance, errors drop for bias).

Debugging checklist (common failure points)

What I’m keeping from Exercise 5

Learning curves turn “ML debugging” into an actual process.
Bias vs variance is a decision tool, not a philosophy.
Lambda is best chosen via validation curves.
Feature normalization is part of the pipeline, not a bonus.
“What to try next” is answerable if you measure the right things.

What’s Next

Next up, I build my first spam classifier using Support Vector Machines.

I’ll learn how C and sigma shape decision boundaries, why tuning them feels similar to lambda, and why linear SVMs scale surprisingly well for high-dimensional text data.

Exercise 6 - Support Vector Machines (When a Different Model Just Wins)

I built my first spam classifier with SVMs—learning how C and sigma shape decision boundaries, and why linear SVMs scale surprisingly well.

Exercise 4 - Neural Networks Learning (Backpropagation Without Tears)

Implement backpropagation for a 2-layer neural network, verify gradients numerically, train on handwritten digits, and hit ~95% accuracy.