Blog
Sep 29, 2019 - 12 MIN READ
Deep Silos - Representation Learning That Respects Feature Families

Deep Silos - Representation Learning That Respects Feature Families

My first serious attempt at making the model “see” microstructure features the way I designed them — grouped, compressed, and only then fused.

Axel Domingues

Axel Domingues

In August, I got my first real supervised “alpha” curves. They were… humbling.

Not because the model was useless — but because it was too easy to get a model that looked smart on the training set and weirdly fragile on validation.

This month I stop treating “add more layers” as the answer and start treating architecture as a way to encode how I believe the features should behave.

I’m calling the approach Deep Silos:

  • group features by microstructure meaning (“feature families”)
  • learn a compact representation per family
  • only then fuse the representations into a shared network

This is the same mindset shift I had in deep RL: stability isn’t luck — it’s engineering.

Not financial advice. This is research engineering work in 2019. I’m documenting what I built, what broke, and what I learned — not telling anyone how to trade.

The problem: my features are not one blob

By now my dataset is a pile of signals with very different “physics”:

  • book shape (depth levels, bid/ask volumes)
  • book flow (liquidity created/removed per side)
  • trade pressure (buy/sell volume in windows)
  • reference signals (index diff, funding-like context)
  • “plumbing” features that exist mainly to sanity-check reality

If I feed all of that into one big MLP, the network is free to do this:

  • grab whichever feature is temporarily predictive in the training slice
  • memorize quirky interactions between unrelated families
  • overfit on regime-specific artifacts (bullish months, low-vol periods)
  • treat missingness and imputation patterns as signal

In other words: it can learn shortcuts.

And in microstructure, shortcuts are everywhere.

A model can look “accurate” while actually learning a data collection artifact:
  • a clock drift signature
  • a missing-data pattern
  • a schema default value
  • a volume feed glitch
  • a dataset split bug
The market doesn’t care that my pipeline is messy. It will happily reward my mistakes in backtests and punish them live.

The idea: feature families deserve their own representation

A “feature family” is a set of features that:

  • come from the same measurement process
  • share scale / statistical behavior
  • should be interpreted together

So instead of a single network learning the representation from scratch, I force the structure:

  1. Silo networks: one small network per family
  2. Embeddings: each silo outputs a tiny vector (a learned summary)
  3. Fusion network: concatenated embeddings → shared layers → prediction

That’s it.

It’s not a fancy paper trick. It’s an inductive bias.


Repo anchors (what I actually built)

deep_silos.py

Defines feature families (silos) and the TensorFlow network function that builds per-silo embeddings + a fusion MLP.

train_alpha_model.py

Training script: loads HDF5 datasets, applies mean/sigma normalization, trains the model, and prints threshold-based diagnostics.


Step 1: define the silos (the “feature families” contract)

The core design choice is the silo list — what gets grouped together.

In alpha_detection/deep_silos.py I encode this directly as lists of feature names.

The file builds a SILOS_LIST and a FEATURES_NAMES_LIST. The silos are patterns like:

  • order book “level i” groups
  • bid and ask volume per depth level
  • created/removed liquidity features (from May’s work)
  • multi-window volume features
  • other microstructure “families” that should be processed together

A simplified excerpt (structure only, not the full list):

# alpha_detection/deep_silos.py (excerpt)
SILOS_LIST = [
  ['level0', 'level1', 'level2', ...],
  ['ask_volume_level0', 'ask_volume_level1', ...],
  ['bid_volume_level0', 'bid_volume_level1', ...],
  ['ask_added_liquidity_level0', 'ask_removed_liquidity_level0', ...],
  ['buy_volume_1.500000', 'buy_volume_2.500000', ...],
  ...
]

The big win here is not that grouping is “correct”.

The win is: it’s explicit and it’s versionable.

If I change features, I have to change the silo contract too — and that forces me to think.

If you can’t explain why two features are in the same family, they probably shouldn’t be fused early.

Step 2: learn a compact embedding per silo

Inside the deep_silos() network, each silo does this:

  • select the silo’s features out of the full feature vector
  • apply a small MLP
  • output a tiny embedding (in my code: 2 units per silo)
  • apply dropout during training

From the repo:

# alpha_detection/deep_silos.py (excerpt)
for feature_list in SILOS_LIST:
    silo_x = tf.gather(x, indices=cols, axis=1)
    out = mlp(silo_x, hidden_sizes=(10, 10), output_size=2, keep_prob=keep_prob)
    silos.append(out)

Two things matter here:

  1. Compression is a regularizer
    If each silo gets only a tiny output, it can’t memorize everything.
  2. Family-level invariances can form
    The silo can learn patterns like “imbalance across levels” or “liquidity draining” inside the family.

Even though I’m not writing those formulas down, the architecture nudges the network to discover them.


Step 3: fuse embeddings + “other features” into a shared network

After the per-silo embeddings:

  • concatenate embeddings
  • optionally concatenate any remaining features not covered by silos
  • apply shared dense layers
  • output the prediction

Again, from the repo:

# alpha_detection/deep_silos.py (excerpt)
silos_concat = tf.concat(silos, axis=1)
x_concat = tf.concat([silos_concat, non_silo_features], axis=1)

out = tf.contrib.layers.fully_connected(x_concat, num_outputs=10)
out = tf.nn.dropout(out, keep_prob)
out = tf.contrib.layers.fully_connected(out, num_outputs=1, activation_fn=None)

This is the “fusion MLP”. It’s where cross-family interactions are allowed — but only after each family is summarized.


Training loop: what I log when I don’t trust myself

The training script is alpha_detection/train_alpha_model.py.

It’s classic TensorFlow-era code (placeholders + sessions), but the interesting part is the instrumentation discipline:

  • mean/sigma computed from training only
  • inf and NaN handled explicitly
  • dropout controlled by keep_prob
  • validation cost checked repeatedly
  • predictions sanity-checked with threshold counters

The script contains a big display_results() function that prints:

  • output mean/std/min/max
  • label mean/std/min/max
  • how many predictions exceed certain thresholds (0.25, 0.5, 1.0, etc.)
  • what the true labels were when the model was confident

This looks crude… but it is exactly the kind of “manual dashboard” I need early:

  • it catches exploding outputs
  • it catches collapsed outputs (everything near 0)
  • it reveals if “confidence” correlates with actual bigger moves
  • it tells me if I’m just learning average behavior
Yes, later I’ll want proper metrics: calibration curves, hit-rate by quantile, and eventually PnL-aware evaluation. But right now in 2019, the first battle is: is the model learning anything real or just noise?

What changed in my thinking

I used to think “feature engineering” ends when you build the feature columns.

Now I think:

Feature engineering continues into the model architecture.

Deep Silos is feature engineering with weights.

It’s me telling the network:

  • “these features belong together”
  • “summarize them first”
  • “don’t mix everything too early”
  • “earn the right to learn cross-interactions”

The deep-silos checklist (what I keep breaking)

Here’s the stuff I keep getting wrong — and the checks that save me:

Verify feature ordering is identical everywhere

If FEATURES_NAMES_LIST changes but the dataset column order doesn’t, the model silently trains on garbage.

  • Compare dataset column names to FEATURES_NAMES_LIST
  • Assert shapes and indices match

Confirm silos cover what you think they cover

A typo in a feature name makes a silo smaller than expected.

  • Log the number of columns per silo
  • Fail fast if a silo is missing columns

Normalize on training only

If I compute mean/sigma across train+valid I leak information.

  • mean/sigma from training slice only
  • reuse for valid/test
  • store them as artifacts

Treat NaN/inf as pipeline bugs first

Replacing NaNs with 0 is a pragmatic band-aid, not a solution.

  • count NaNs per feature
  • track NaN rate over time
  • inspect where they originate

Watch validation early, not at the end

With microstructure, overfitting can happen fast.

  • plot train vs valid cost per epoch
  • stop when valid cost worsens consistently

Common failure modes (symptom → likely cause → first check)

  • Validation improves then collapses → overfitting to regime/microstructure artifact → compare by time slice, not random shuffle
  • Outputs explode (very large values) → normalization bug or NaN/inf propagation → print feature stats after normalization
  • Outputs collapse near 0 → learning rate too low or label scale mismatch → print label distribution and baseline predictor
  • Model “predicts” mostly during missing-data moments → imputation pattern leakage → plot predictions vs missingness counters
  • Training is unstable across runs → small dataset + non-stationary market → fix seeds and log dataset hashes
  • Silo embeddings don’t help at all → silo grouping is wrong or too small → try bigger embedding size (2 → 8) and re-evaluate
  • Good MSE, bad “useful” predictions → optimizing wrong objective → inspect threshold hit rates and later map to trading decisions

Field notes (what surprised me)

  • “More structure” can outperform “more capacity” even with the same data.
  • Some feature families are simply unreliable (volume can lie more than I expected).
  • The most dangerous bugs are the silent ones: wrong column ordering, wrong normalization, wrong split.
  • Deep Silos doesn’t magically make the model smart — but it makes failures easier to interpret.

Deliverables (what I can reproduce right now)

This month’s output isn’t a “production model”. It’s a reproducible experiment:

  • alpha_detection/deep_silos.py — the architecture definition
  • alpha_detection/train_alpha_model.py — training loop + diagnostics
  • Saved artifacts (local, for now):
    • mean/sigma vectors used for normalization
    • plots of training vs validation cost
    • text dumps of “high-confidence” predictions with timestamps

Expected result (qualitative, no fake precision):

  • training curve still improves quickly (it always does)
  • validation curve is less jumpy than the naive MLP baseline
  • “confident predictions” are rarer, but less random
  • debugging becomes easier because I can isolate which silo contributes to instability
Even if validation looks better, this is still not “real-world alpha”. Until I can monitor the model live and observe predictions in context (spread, liquidity, outages), I’m still in the lab.

FAQ


What’s next

Next month I take this model out of the quiet offline world and into something closer to a conversation with the market.

If the model is real, it should behave sensibly in real time:

  • predictions should cluster around microstructure events that make sense
  • confidence shouldn’t spike on data glitches
  • and I should finally see whether the market is “speaking” in the features I engineered

I’m not expecting magic.

I’m expecting feedback — and probably pain.

Axel Domingues - 2026