
The most practical assignment so far - diagnose bias vs variance using learning curves, tune lambda with validation curves, and build a repeatable “next action” playbook.
Axel Domingues
Up to Exercise 4, my workflow was mostly:
Exercise 5 changes the question.
Instead of “can I train a model?”, the question becomes:
My model isn’t performing. What do I try next?
This is the assignment that feels the most like real engineering:
The core idea is bias vs variance, and the main tool is learning curves.
The question this exercise answers
“My model performs badly — what should I try next?”
The diagnostic tool
Learning curves: training vs validation error as data grows.
The decision framework
Bias vs variance → apply the right fix instead of random tweaks.
ex5.m — main guided scriptex5data1.mat — water outflow dataset (intentionally small)featureNormalize.m — mean/std normalizationpolyFeatures.m — polynomial feature expansionplotFit.m — visualize polynomial fitslinearRegCostFunction.m — cost + gradient (regularized)trainLinearReg.m — reusable training wrapperlearningCurve.m — training vs validation errorvalidationCurve.m — select lambda via validationIn production, the painful part of ML is rarely “writing the model.”
The painful part is:
Exercise 5 gives you a structured answer.
The first dataset is intentionally tiny.
That’s not a mistake — it’s the point.
That’s exactly why they’re useful here — they make underfitting vs overfitting visually and numerically obvious.
A small dataset makes it easier to see:
You start with a simple relationship:
This is “regression,” but the focus is debugging behavior, not the domain.
You implement linearRegCostFunction.m.
It returns:
J: squared error cost (plus regularization when lambda > 0)grad: gradient vector (plus regularization for weights)Key rule (again):
theta(1))theta(1)), every diagnosis after this becomes unreliable.This is a silent bug: nothing crashes, results just look “off”.
Implementation pattern:
function [J, grad] = linearRegCostFunction(X, y, theta, lambda)
m = length(y);
h = X * theta;
theta_reg = theta;
theta_reg(1) = 0;
J = (1/(2*m)) * sum((h - y) .^ 2) + (lambda/(2*m)) * sum(theta_reg .^ 2);
grad = (1/m) * (X' * (h - y)) + (lambda/m) * theta_reg;
end
If your gradients are wrong, everything after this becomes noise. Validate this function before moving on.
This helper wraps optimization so the script can repeatedly train models for:
The function is simple: it calls an optimizer with your cost function.
This is an important pattern:
Learning curves plot two errors as you increase the number of training examples m:
The goal is not pretty graphs.
The goal is diagnosis.
When building learning curves in this assignment:
Why?
Because we want to measure pure fit error, not the extra penalty term.
This detail matters or your curves become misleading.
Training uses regularization to prevent crazy parameters. But error reporting uses lambda = 0 so you’re comparing real prediction error.

Signs
What to try
Signs
What to try
Next, the assignment has you build polynomial features.
This is a practical move:
But as soon as you add polynomial features, you create a new risk:
So the exercise makes you do the full workflow:
Polynomial features explode in magnitude:
Without normalization:
Skipping it leads to unstable optimization and misleading learning curves.
The provided featureNormalize.m is part of the “real workflow” habit.
Once you have polynomial features, lambda becomes the main knob.
The correct way to pick lambda is:
Typical grid:
lambda in {0, 0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1, 3, 10}
This assignment formalizes that into validationCurve.m.
Do not use the test set to choose lambda. The test set is the final exam.
This is the practical output of Exercise 5.
When performance is bad, I now ask:
Use learning curves.
In real projects, “get more data” isn’t always possible. Regularization and better features are often the first real levers.
Validate cost and gradient shapes and ensure bias is not regularized.
Plot the fit and get an initial sense of underfit/overfit.
Compute training and validation error as the training set grows.
Increase model power in a controlled way.
Select lambda based on validation error, not intuition.
Confirm the diagnosis improves (gap shrinks for variance, errors drop for bias).
Likely bug in learningCurve.m — model may be retrained incorrectly or reused.
Check that reported errors are computed with lambda = 0, as required.
Normalization likely applied inconsistently across training / validation / test sets.
Bias term may be regularized by mistake.
Next up, I build my first spam classifier using Support Vector Machines.
I’ll learn how C and sigma shape decision boundaries, why tuning them feels similar to lambda, and why linear SVMs scale surprisingly well for high-dimensional text data.
Exercise 6 - Support Vector Machines (When a Different Model Just Wins)
I built my first spam classifier with SVMs—learning how C and sigma shape decision boundaries, and why linear SVMs scale surprisingly well.
Exercise 4 - Neural Networks Learning (Backpropagation Without Tears)
Implement backpropagation for a 2-layer neural network, verify gradients numerically, train on handwritten digits, and hit ~95% accuracy.