
A year after finishing Andrew Ng’s classical ML course, I’m trying to separate enduring principles from deep learning-specific techniques—and decide where to go next.
Axel Domingues
This closes my 2017 learning sprint.
In 2016, I went through Andrew Ng’s Machine Learning course to remove the magic: implement the algorithms, understand the diagnostics, and learn how to reason about models as systems.
In 2017, I stepped into Neural Networks and Deep Learning—not to chase buzzwords, but to understand why deep nets suddenly started working in practice, and what was genuinely new versus what was the same old ML with bigger compute.
After twelve months of building, breaking, and re-building intuition, I think I can finally answer the question I kept asking all year:
What actually changed… and what didn’t?

Deep learning felt “new” in 2017 because:
But under the hood, the backbone stayed very familiar:
Even after CNNs and LSTMs, the workflow was still:
The “new” part wasn’t the loop.
The new part was how hard it became to keep the loop stable at scale.
I expected deep nets to break this framework.
They didn’t.
If anything, deep learning made it more visible:
My 2016 instincts still worked:
The biggest continuity from 2016 to 2017 is this:
The model is not the product. The training process is part of the system.
I kept leaning on the same engineering habits:
This was the biggest conceptual shift.
2016 - Bias lived in features
I shaped the input: transforms, kernels, PCA, manual feature design.
2017 - Bias lived in structure
The model structure carried assumptions: locality (CNNs), time (RNNs), stable memory paths (LSTMs).
In 2016, a lot of performance came from features:
In 2017, I watched architectures bake in assumptions:
So “good modeling” became less about manually crafting inputs and more about choosing a structure that matches reality.
Architecture became a form of prior knowledge.
In 2016, gradient descent felt like a tool.
In 2017, it felt like the boss.
The “deep learning trick bag” wasn’t superficial—it was survival gear:
The uncomfortable truth I learned:
A model can be theoretically expressive and still be practically untrainable.
And “trainable” is a property you design for.
CNNs were the turning point for me.
Before CNNs, I thought “features” were something I engineered.
After CNNs, it became obvious that:
That shifted how I think about building ML systems:
In 2016, regularization was mostly:
In 2017, regularization expanded:
So regularization became something I applied in multiple layers of the system, not one knob.
This year made me rewrite my internal checklist.
Instead of thinking “choose algorithm → tune it”, I now think:
Architecture is not just implementation detail — it’s the inductive bias.
If loss doesn’t go down reliably, stop. Fix initialization, learning rates, gradients, batch sizes, clipping.
Don’t guess. Instrument. Keep baselines. Change one thing at a time.
My one-page checklist (the version I actually reuse)
I used to treat “optimization” as a math topic.
Now I treat it as an engineering reality.
Deep learning success is often the result of making optimization possible at scale.
CNNs and LSTMs made this concrete:
So the deep learning revolution (as I experienced it) was not magic.
It was a stack of design decisions that finally aligned:
Foundations (Jan–Mar)
Perceptrons and backprop were “old ideas” that became real only when I implemented and debugged them.
Trainability (Apr–Jun)
ReLU, initialization, momentum: the practical gear that made deep nets stop behaving like fragile math experiments.
Vision (Jul–Aug)
CNNs taught me the power of inductive bias and why representation learning beats manual feature design.
Sequences (Sep–Nov)
RNNs broke my assumptions. LSTMs showed how architecture can solve optimization problems directly.
I’m ending 2017 with a new kind of excitement.
Deep learning taught me that learning systems can be engineered, not just studied.
And once you start thinking that way, the next question becomes unavoidable:
What happens when the model doesn’t just predict… but acts?
In 2018, I want to shift focus to:
What pulled me in is the same thing that pulled me into CNNs and LSTMs:
And yes—the recent results in games (especially Go) make it hard not to be curious.
My goal isn’t to chase headlines.
My goal is the same as it was in 2016:
Not in my experience. Deep learning expanded what’s practical for images and sequences, but the classical fundamentals—optimization, regularization, diagnostics—are still the backbone. Deep learning felt understandable precisely because it connected back to those principles.
When the data is small, the problem is tabular/structured, interpretability matters, or you need fast iteration with tight constraints.
Deep learning shines when structure is rich (images, audio, text) and you can afford the training + debugging loop.
How often “the model isn’t learning” was not a data issue or a conceptual issue, but a trainability issue. ReLU, initialization, momentum, clipping—these weren’t optional tricks, they were enabling conditions.
Because it extends everything I learned about optimization and systems into a setting where actions change the data you get back. It feels like the next logical step after spending a year learning how to train deep models reliably.
Rewards, Returns, and Why “Learning” Is an Interface Problem
I’m starting 2018 by shifting from deep learning to reinforcement learning. The first lesson isn’t an algorithm — it’s that the data pipeline is the policy itself.
LSTMs - Engineering Memory into the Network
After vanilla RNNs taught me why gradients collapse through time, LSTMs finally felt like an engineered solution - keep the memory path stable, and control it with gates.