
I wrapped Andrew Ng’s ML course by building anomaly detection and a simple movie recommender—two patterns that show up everywhere in real systems.
Axel Domingues
This is the final stop in my 2016 run through Andrew Ng’s Machine Learning course.
It shows how many of the ideas from the entire course reappear together: probability modeling, optimization, regularization, metrics, and system thinking.
Exercise 8 is a fun combination because it covers two problems that feel very “real”:
Both problems forced me to stop thinking like “I’m fitting a line” and start thinking like “I’m building a system that makes decisions.”
Goal: learn what “normal” looks like, then flag rare deviations.
Implemented:
estimateGaussian.mmultivariateGaussian.mselectThreshold.m (F1-based)Goal: predict ratings by learning movie features and user preferences.
Implemented:
cofiCostFunc.mnormalizeRatings.mNot:“Does this look like the data I usually see?”
Anomaly detection is what you use when you don’t have labels for “bad cases”, or when bad cases are rare.
Think:
The approach in the course is intentionally simple:
The first dataset is 2D, so you can visualize it.
In code, the learning step is almost embarrassingly simple:
Then for each example, compute its probability under the model.
Even if you later use fancier models, this exercise is great because it builds the habit of asking: “What does normal look like, statistically?”
The hard part isn’t computing probabilities.
The hard part is deciding: how small is “too small”?
The exercise uses a cross-validation set with labels (normal vs anomaly) only for tuning the threshold. The method:
p < epsilonIn the provided exercise, the selected epsilon ends up being extremely small (on the order of 1e-18), and the model identifies 117 anomalies in the higher-dimensional dataset.
A model that labels everything as normal can look “99% accurate” and still be useless.
F1 is the right metric here because anomalies are rare. Accuracy can look great even when the model is useless.
This is the part that made me grin because the output looks like something a product could ship.
The dataset is a big matrix of movie ratings:
Y(i, j) is the rating that user j gave to movie iR(i, j) is 1 if user j rated movie i, else 0That R matrix is the key. It tells the algorithm which errors to care about.
R is equivalent to training on hallucinated data.The model will confidently optimize errors that don’t exist.
If you forget to mask by R, you’ll accidentally punish the model for “not predicting ratings that were never provided.”
Instead of hand-crafting movie features (genre, cast, etc.), we learn them.
i gets a feature vector x(i)j gets a preference vector theta(j)That’s it.
The collaborative filtering objective is:
R(i, j) = 1:Then we compute gradients for both:
XThetaThis is the second time in the course where vectorization stops being “nice to have” and becomes “you can’t finish otherwise.”
A nested loop over movies x users is slow and messy. The exercise pushes you into expressing the logic as matrix operations.
Here’s the mental model I kept:
X * Theta'(X * Theta' - Y)ROnce you do that, the cost and gradients fall out cleanly.
The exercise includes a script where you add your own ratings (a “new user”), train the model, and then print top recommendations.
The PDF includes an example output like:
And it also shows the kind of “seed ratings” the new user provided (a handful of 2–5 star ratings).
The important note is that your results may differ because the model starts from random initialization.
The first time you see a model recommend movies you didn’t rate, it’s a good reminder that “learning” is often just optimization + data.
It’s tempting to treat anomaly detection and recommenders as special topics.
But as an engineer, what I’m really taking away is:
R is basically “don’t hallucinate data”)In 2016, it’s impossible to ignore what’s happening with deep learning.
My plan after this course:
I’m happy I started with the fundamentals first. Deep learning is powerful, but without the basics (optimization, regularization, diagnostics), it’s too easy to treat it like magic.
This closes my 2016 journey through Andrew Ng’s Machine Learning course.
In 2017, I shift focus to Neural Networks and Deep Learning — the wave that is reshaping the field at this time.
My next phase is about understanding:
These ideas powered breakthroughs like AlexNet and kicked off the modern deep learning revolution.
The fundamentals from this course — optimization, regularization, diagnostics, and system thinking — turned out to be the difference between using deep learning and actually understanding why it works.
From Logistic Regression to Neurons - Rebuilding Intuition from the Perceptron
After finishing Andrew Ng’s Machine Learning course, I start my deep learning journey by revisiting the perceptron and realizing neural networks begin with ideas I already understand.
Exercise 7 - Unsupervised Learning (K-means) + PCA (Compression & Visualization)
Implement K-means clustering, compress an image to 16 colors, then use PCA to reduce dimensions and build eigenfaces.