
After finishing Andrew Ng’s Machine Learning course, I start my deep learning journey by revisiting the perceptron and realizing neural networks begin with ideas I already understand.
Axel Domingues
I finished Andrew Ng’s Machine Learning course at the end of 2016 feeling confident about the fundamentals: linear models, logistic regression, regularization, diagnostics, and how to debug models like an engineer.
So when I decided to shift focus to Neural Networks and Deep Learning in 2017, I expected a sharp break — new math, new mental models, maybe even a bit of magic.
That didn’t happen.
Instead, the first thing I ran into was the perceptron, and my immediate reaction was:
“Wait… I already know this.”
This post is about that moment — and why it mattered more than I expected.
What you’ll learn in this post
How the perceptron connects to logistic regression, and why that makes deep learning feel like a continuation—not a reset.
Best read if you already know
Logistic regression basics, gradients, and why “smooth” functions matter for training.
Mental model to keep
A “neuron” is a building block you can stack, not a totally new kind of magic model.
The perceptron is often presented as the “origin story” of neural networks. A historical artifact. Something primitive before the real ideas arrive.
But starting here turned out to be important — not because it’s powerful, but because it’s familiar.
The perceptron forced me to ask a very grounding question:
What is actually new about neural networks, compared to the models I already understand?
A perceptron takes inputs, multiplies them by weights, sums them up, and passes the result through an activation.
That description already sounded suspiciously close to logistic regression.
Same ingredients:
In logistic regression you already do:
Same pieces, different framing:
weighted_sum = w1*x1 + w2*x2 + ... + b
output = activation(weighted_sum)
The math feeling isn’t new—what’s new is that this unit becomes something you can stack and compose.
The difference wasn’t what it computes — it was how it’s framed.
Instead of thinking in terms of “a model”, I was now thinking in terms of neurons as building blocks.
Historically, the perceptron uses a hard threshold: output 0 or 1 depending on which side of a boundary you land on.
That immediately rang alarm bells.
I already knew from 2016 ML that:
This is where modern neural networks quietly diverge from the original perceptron idea.
They keep the structure, but replace the hard decision with smooth activations that gradients can flow through.
That distinction — conceptually correct but practically limited — would come up again and again throughout deep learning.
With a hard threshold, tiny changes to weights often produce no change in output until you cross the boundary. That makes “small corrective updates” unreliable—learning becomes unstable or stalls.
A smooth activation changes gradually, so small weight updates produce small output changes. That’s what makes iterative learning feel like steering instead of jumping off cliffs.
When you see training fail later, one of the first questions is: “Did I accidentally create a system where gradients can’t meaningfully flow?”

This month helped me separate continuity from novelty.
Not new (continuity)
New (novelty)
The perceptron didn’t replace my ML intuition — it anchored it.
In classical ML, I thought in terms of:
“What hypothesis class am I choosing?”
With neural networks, I started thinking:
“What representations can this architecture build?”
That’s a very different question.
It’s not about drawing a boundary directly — it’s about learning the features that make the boundary simple.
And that idea didn’t fully land yet — but the perceptron was the first crack in the door.
A few habits from 2016 carried over immediately:
That mindset would become essential very quickly.
Before this month, I subconsciously thought of neural networks as “a different category of models”.
After revisiting the perceptron, I realized something important:
Neural networks didn’t replace classical machine learning.
They reused it, then scaled it through composition.
That insight made the rest of deep learning feel less intimidating — and more like a continuation of a story I was already part of.
Next, I need to understand the mechanism that makes stacking neurons actually work:
Backpropagation.
I know gradient descent. I know the chain rule.
But applying them together, across layers, is where most explanations get hand-wavy.
That’s what February is for.
Backpropagation Demystified - It’s Just the Chain Rule (But Applied Ruthlessly)
Backprop stopped feeling like magic when I treated it like engineering - track shapes, follow the chain rule, and test gradients like you’d test any critical system.
Exercise 8 + Course Wrap - Anomaly Detection & Recommenders (and My Next Steps)
I wrapped Andrew Ng’s ML course by building anomaly detection and a simple movie recommender—two patterns that show up everywhere in real systems.