
Notes from Andrew Ng’s ML course — plot the food-truck dataset, implement computeCost + gradientDescent in Octave, and build intuition with J(theta) visualizations.
Axel Domingues
Exercise 1 is where Machine Learning stopped feeling like a set of lecture slides and started feeling like a system I can debug.
The assignment story is simple: a food-truck chain wants to decide which cities to expand to. We have historical data and we want to predict profit from population.
But what you’re really learning is the core loop you’ll repeat for most ML models:
What you’ll build
A working linear regression loop in Octave: load data → build X → compute cost → run gradient descent → visualize fit & convergence.
Skills you practice
Vectorization, shape sanity-checks, training telemetry (J_history), and building intuition with the cost “bowl”.
Checkpoints (don’t skip)
theta = [0; 0], expect J ≈ 32.07J_history should decrease (not explode)The starter bundle is structured around a script that guides you step-by-step and a few functions you complete.
ex1.m — main script (single-variable linear regression)ex1data1.txt — dataset (population vs profit)warmUpExercise.m — Octave warm-upplotData.m — scatter plot helpercomputeCost.m — compute cost for linear regressiongradientDescent.m — run batch gradient descentsubmit.m — Coursera submit scriptSkim the “Dataset” section once, then follow the walkthrough step-by-step. If anything breaks: jump to the “Debugging checklist” at the end.
ex1data1.txt has two columns:
x: population of the city (in 10,000s)y: profit (in $10,000s)x and y are both in 10,000s (population and dollars).Negative profit means the truck is losing money.
Before coding anything: open the .txt and sanity-check a few rows. If you misread units, you can build a “perfect” model that answers the wrong question.
The warm-up is intentionally tiny: return a 5×5 identity matrix.
A = eye(5);
I like this because it confirms the full edit → run → submit loop works before you touch ML logic.
If you only do one thing before implementing learning, do this.
In plotData.m, plot x vs y as a scatter plot.
plot(x, y, 'rx', 'MarkerSize', 10);
ylabel('Profit in $10,000s');
xlabel('Population of City in 10,000s');
What I’m looking for:
Exercise 1’s data looks “linear enough” that a straight line is a reasonable first model.
In this course, you learn to think in matrices early.
For single-variable linear regression, you construct X like this:
xm = length(y);
X = [ones(m, 1), x];
theta = zeros(2, 1);
From now on, predictions are just:
predictions = X * theta;
No loops. Cleaner code. Fewer bugs.
computeCost returns a single number: how wrong your current parameters are.
In plain terms:
thetaA clean vectorized implementation:
function J = computeCost(X, y, theta)
m = length(y);
predictions = X * theta;
errors = predictions - y;
J = (1/(2*m)) * (errors' * errors);
end
With theta = [0; 0], the exercise expects the cost to be approximately:
J ≈ 32.07If you don’t get close to that, stop and fix computeCost.m first.
Gradient descent depends on computeCost. If the cost is wrong, “learning” will look like random parameter motion.
Now we update theta iteratively.
The provided script uses:
alpha = 0.01iterations = 1500In gradientDescent.m, you:
J_historyVectorized implementation:
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
m = length(y);
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
errors = (X * theta) - y;
theta = theta - (alpha/m) * (X' * errors);
J_history(iter) = computeCost(X, y, theta);
end
end
J_history decreases over timex values look plausiblePrint cost every ~50 iterations while debugging. This is your training telemetry.
Once theta is learned, ex1.m plots the learned line over the scatter plot.
This is the first “wow” moment:
The script explores a grid of parameter values and plots:
The key intuition you should walk away with:
theta)
This visualization is why I like Exercise 1: it forces intuition, not just passing tests.
Print sizes:
size(X)size(theta)size(y)You want X * theta to produce an m x 1 vector.
* is matrix multiplication.* is element-wise multiplicationIn Octave:
theta(1) is the intercept termtheta(2) is the slopeInf / NaNJ_history as your telemetry, not vibes.If computeCost is wrong, gradient descent can “run” but learn nonsense.
Use the checkpoint J ≈ 32.07 early.
J_history), not hope.alpha to 0.1 and watch what happens to J_history.alpha to 0.001 and compare how fast the cost drops.theta every ~200 iterations and see how it stabilizes.In the next post, we level up from one input to many: Multivariate Linear Regression.
You’ll extend linear regression to multiple features, learn feature scaling + mean normalization to make gradient descent behave, and stop writing slow loops by vectorizing everything.
Linear Regression With Multiple Variables (and Why Vectorization Matters)
Notes from Andrew Ng’s ML course — extend linear regression to multiple features, learn feature scaling/mean normalization, and stop writing slow loops by vectorizing everything.
Why I’m Learning Machine Learning
In 2016, I’m documenting my journey through Andrew Ng’s Machine Learning course—building intuition, writing Octave code, and learning how to think in data.