
I train my first alpha predictor on BitMEX order-book features, and learn why ‘it trains’ is not the same as ‘it works’.
Axel Domingues
In 2018 I spent a full year learning Reinforcement Learning (RL) and Deep RL with an instrumentation-first mindset. By December, I had a mental model I trusted:
So when I started supervised alpha detection in mid-2019, I thought it would feel… easier. No credit assignment over long horizons. No reward hacking. No policy collapse.
What I got instead was a different kind of pain:
My model trained fine, my loss went down, and my results were still useless.
This post is about that first month where I trained “real” alpha models on BitMEX order-book features — and got humbled by the gap between:
By August, the system has a shape. It’s not “production,” but it’s a real pipeline with real failure modes.
Now the question is brutally simple:
Given my features at time t, can a model predict a useful “alpha outcome ahead”?
In the repo, this month centers on:
alpha_detection/train_alpha_model.py (training + evaluation loop)…and it depends on the label and feature pipelines created in earlier months.
In Deep RL, baselines protect you from storytelling. In trading, they protect you from a worse sin:
building complexity to compensate for a target that isn’t learnable (or isn’t worth learning).
So I keep a strict rule:
In August, my baselines were:
And then… I accidentally built something that looked like a baseline but wasn’t:
That tension (baseline vs architecture) becomes important later.
I’m using a look-ahead label (from July) that tries to answer:
In the repo this is produced by:
alpha_detection/produce_alpha_outcome_ahead.pyAnd the big constraints I keep repeating to myself:
This month’s uncomfortable discovery is that even with those rules…
The label distribution itself can quietly kill you.
Because the market is mostly “nothing happens,” with occasional violent bursts.
So the first real lesson wasn’t about neural nets. It was about rare events.
The training loop lives in alpha_detection/train_alpha_model.py.
The structure is intentionally boring:
The reason to keep it boring is simple:
In trading ML, every extra feature is a new place to accidentally cheat.
The script expects data files that already include:
And it follows an explicit train/valid/test directory split.
Even if you don’t leak features, you leak distribution.
The script recomputes mean and sigma from the training set only and then reuses it:
inf and NaN as zero after normalizationThis matches June’s theme:
Normalization is not “preprocessing.” It’s part of your deployment contract.
If you normalize differently between training and inference, you don’t have a model. You have two unrelated functions.
Here’s what my first training runs looked like:
The most painful part:
It looked like progress.
I could have written a victory post. Instead I did what 2018 taught me:
…and the story changed.
When the label distribution is heavy-tailed, the safest move (for MSE-style training) is often:
That gives you a nice stable loss. And it gives you nothing actionable.
In train_alpha_model.py, I even had a small “threshold check” that prints counts for cases like:
The humbling part:
So the model was not learning “signal.” It was learning “how not to be embarrassed by the average.”
In Deep RL, I log rewards, values, entropy, KL, advantage stats.
In supervised alpha, I log different lies.
Here’s what I consider the minimum dashboard for this month:
Run the label builder on top of your already-generated feature dataset.
The goal is: one HDF5 file that contains both features and a column for the look-ahead outcome.
Create explicit train/valid/test ranges. My default is:
No shuffling across time boundaries.
Run the training script:
alpha_detection/train_alpha_model.pyKeep the config minimal:
The point is not performance. The point is: does anything learnable exist?
Don’t just read the final loss. Open the output artifacts:
If the model collapses to predicting the mean, treat it as a diagnosis, not a failure.
This month is where I stop thinking of supervised learning as “easier than RL.”
In RL, the challenge is credit assignment.
In supervised trading, the challenge is:
A model can be “correct” and still be untradeable.
And worse:
A model can be “correct” in backtests because the backtest is lying.
So August becomes a month of humility and discipline. Not because I failed to train. Because I learned what training success is allowed to hide.
Alpha training loop
The training + evaluation script: reads HDF5, trains a model, and writes results.
Alpha label generation
Builds the look-ahead outcome label. This is where leakage traps start if you’re careless.
I tried both ways. Regression is useful early because it forces me to understand the label distribution and the scale of errors. Classification can look “cleaner,” but it often hides the fact that the model is learning the majority class (market boredom).
Shuffling rows before splitting. You won’t leak future features, but you leak future regimes and volatility structure. Your validation becomes a mirror of your training set.
Because deployment uses them. If training and inference disagree on normalization, your model’s inputs change meaning. The backtest you trusted and the bot you run are no longer the same system.
In RL I learned to distrust reward curves. In supervised trading I learned to distrust loss curves. Different plot, same problem: a single scalar can hide a lot of lies.
This month I proved the pipeline can train models, but I also learned that “baseline MLP” is not enough.
The next step is architectural.
I want a model that:
Because in trading, debuggability is not optional. It’s survival.
Deep Silos - Representation Learning That Respects Feature Families
My first serious attempt at making the model “see” microstructure features the way I designed them — grouped, compressed, and only then fused.
Defining Alpha Without Cheating - Look-Ahead Labels and Leakage Traps
Before I train anything, I need a label that doesn’t smuggle the future into my dataset.