
My first serious attempt at making the model “see” microstructure features the way I designed them — grouped, compressed, and only then fused.
Axel Domingues
In August, I got my first real supervised “alpha” curves. They were… humbling.
Not because the model was useless — but because it was too easy to get a model that looked smart on the training set and weirdly fragile on validation.
This month I stop treating “add more layers” as the answer and start treating architecture as a way to encode how I believe the features should behave.
I’m calling the approach Deep Silos:
This is the same mindset shift I had in deep RL: stability isn’t luck — it’s engineering.
By now my dataset is a pile of signals with very different “physics”:
If I feed all of that into one big MLP, the network is free to do this:
In other words: it can learn shortcuts.
And in microstructure, shortcuts are everywhere.
A “feature family” is a set of features that:
So instead of a single network learning the representation from scratch, I force the structure:
That’s it.
It’s not a fancy paper trick. It’s an inductive bias.
The core design choice is the silo list — what gets grouped together.
In alpha_detection/deep_silos.py I encode this directly as lists of feature names.
The file builds a SILOS_LIST and a FEATURES_NAMES_LIST. The silos are patterns like:
A simplified excerpt (structure only, not the full list):
# alpha_detection/deep_silos.py (excerpt)
SILOS_LIST = [
['level0', 'level1', 'level2', ...],
['ask_volume_level0', 'ask_volume_level1', ...],
['bid_volume_level0', 'bid_volume_level1', ...],
['ask_added_liquidity_level0', 'ask_removed_liquidity_level0', ...],
['buy_volume_1.500000', 'buy_volume_2.500000', ...],
...
]
The big win here is not that grouping is “correct”.
The win is: it’s explicit and it’s versionable.
If I change features, I have to change the silo contract too — and that forces me to think.
Inside the deep_silos() network, each silo does this:
From the repo:
# alpha_detection/deep_silos.py (excerpt)
for feature_list in SILOS_LIST:
silo_x = tf.gather(x, indices=cols, axis=1)
out = mlp(silo_x, hidden_sizes=(10, 10), output_size=2, keep_prob=keep_prob)
silos.append(out)
Two things matter here:
Even though I’m not writing those formulas down, the architecture nudges the network to discover them.
After the per-silo embeddings:
Again, from the repo:
# alpha_detection/deep_silos.py (excerpt)
silos_concat = tf.concat(silos, axis=1)
x_concat = tf.concat([silos_concat, non_silo_features], axis=1)
out = tf.contrib.layers.fully_connected(x_concat, num_outputs=10)
out = tf.nn.dropout(out, keep_prob)
out = tf.contrib.layers.fully_connected(out, num_outputs=1, activation_fn=None)
This is the “fusion MLP”. It’s where cross-family interactions are allowed — but only after each family is summarized.
The training script is alpha_detection/train_alpha_model.py.
It’s classic TensorFlow-era code (placeholders + sessions), but the interesting part is the instrumentation discipline:
keep_probThe script contains a big display_results() function that prints:
This looks crude… but it is exactly the kind of “manual dashboard” I need early:
I used to think “feature engineering” ends when you build the feature columns.
Now I think:
Feature engineering continues into the model architecture.
Deep Silos is feature engineering with weights.
It’s me telling the network:
Here’s the stuff I keep getting wrong — and the checks that save me:
If FEATURES_NAMES_LIST changes but the dataset column order doesn’t, the model silently trains on garbage.
FEATURES_NAMES_LISTA typo in a feature name makes a silo smaller than expected.
If I compute mean/sigma across train+valid I leak information.
Replacing NaNs with 0 is a pragmatic band-aid, not a solution.
With microstructure, overfitting can happen fast.
This month’s output isn’t a “production model”. It’s a reproducible experiment:
alpha_detection/deep_silos.py — the architecture definitionalpha_detection/train_alpha_model.py — training loop + diagnosticsExpected result (qualitative, no fake precision):
I tried. Dropout and weight decay help, but they don’t encode meaning. Deep Silos is me saying “these features are related; these aren’t” — and forcing the network to respect that early.
I start from microstructure first: what features come from the same mechanism? Then I sanity-check by statistics (scale, missingness, correlation). If I can’t justify a grouping, I keep the features separate.
It’s a deliberate constraint. If it’s too small, I’ll raise it (2 → 4 → 8) and treat that like a controlled experiment. The point is to prevent the silo from becoming a full model on its own.
No. It reduces one type of overfitting (feature mixing shortcuts). Regime shift is still the bigger enemy — and maybe I’ll get punished by it later.
Next month I take this model out of the quiet offline world and into something closer to a conversation with the market.
If the model is real, it should behave sensibly in real time:
I’m not expecting magic.
I’m expecting feedback — and probably pain.
Live Alpha Monitoring - When the Market Talks Back
I stop treating my alpha model like a notebook artifact and make it sit in the real BitMEX stream. The goal is not trading yet. It is seeing whether my features, normalization, and inference loop survive reality without quietly cheating.
Supervised Baselines - First Alpha Models, First Humbling Curves
I train my first alpha predictor on BitMEX order-book features, and learn why ‘it trains’ is not the same as ‘it works’.