
If RL taught me “the state is the contract,” then trading is where that contract becomes painful. This month I map order book microstructure into concrete feature families my models can actually learn from.
Axel Domingues
In 2018, RL taught me something I didn’t fully appreciate at the time:
the state representation is the real interface.
If your agent sees the wrong state, it learns the wrong world.
Now I’m aiming at BitMEX (Bitcoin Mercantile Exchange) and suddenly that lesson is unavoidable.
Because in markets, “state” is not a grid world.
It’s a living order book — and it’s very easy to lie to yourself about what it’s saying.
This month is about making a promise:
whatever I train on, I must be able to compute the same way in live trading.
That means defining feature families grounded in microstructure, and implementing them in code that can run both in the collector and in the live loop.
Repo grounding (this post’s center of gravity):
BitmexPythonDataCollector/SnapshotManager.pyBitmexPythonChappie/SnapshotManager.pyThe mental model
The model doesn’t “see the market”.
It sees a vector I choose to construct.
Feature families
Price, depth, flow, volatility, liquidity creation — each one is a different microstructure story.
Multi-scale windows
Everything is computed over time windows (seconds → minutes → hour) because regimes live at multiple speeds.
Train/live parity
The same feature computation exists in both the collector and Chappie, so “research” can’t drift from “reality”.
I’m documenting how I build systems, how they fail, and how I debug them — not suggesting anyone should trade.
In Gym environments, the observation space is a spec.
In trading, it’s a trap.
You can always invent a feature that looks predictive in a backtest — especially if it accidentally smuggles the future in.
So I’m approaching features like I approached RL instrumentation in 2018:
The implementation lives in SnapshotManager.__create_current_snapshot(...).
The point is not “clever features”.
The point is: features that represent microstructure effects I can explain.
These are the obvious ones, but they anchor everything else:
quotes_diff)seconds_last_move)This is where I start because it gives me basic sanity checks:
I don’t start with a full book image yet.
I start with something I can reason about:
RELEVANT_NUM_QUOTES = 5)bid_depth_l10, ask_depth_l10, bid_depth_l25, ask_depth_l25)This is microstructure in plain terms:
Instead of only looking at current best bid/ask, I compute how much they moved over different time windows.
In code these are:
bid_change_<window>ask_change_<window>Each window is in seconds, and the system uses many windows (from ~1 second up to an hour).
This gives the model a chance to learn:
without me hardcoding “trend”.
This is one of the first features that made me feel like I was doing microstructure and not just “technical indicators”.
For each window (2 min → 1 hour), I compute:
cdf_<window>: where the current mid price sits inside a normal approximation of recent mid pricesstd_pct_<window>: volatility proxy as std/mean over that windowThis becomes my early “regime sensor”:
A limit order is intention.
A market order is conviction.
So I compute traded volume separated by aggressor side:
buy_volume_<window>sell_volume_<window>Not because volume is magic — but because in an order book world, flow is often the first thing that changes before price does.
This one is very “order-book-native” and becomes important later when maker behavior matters.
Over short windows I track created liquidity:
best_bid_created_liq_<window>, best_ask_created_liq_<window>bids_created_liq_<window>, asks_created_liq_<window>In plain words:
I don’t trust single timescale features.
Markets have multiple clocks:
So the SnapshotManager computes many features across multiple time windows.
These constants are defined in both SnapshotManagers:
RELEVANT_TRADED_VOLUME_WINDOW (seconds):[1.5, 2.5, 5.5, 15.5, 30.5, 60.5, 120.5, 300.5, 900.5, 1800.5, 3600.5]RELEVANT_QUOTES_WINDOW (seconds):[120, 300, 900, 1800, 3600]RELEVANT_CREATED_LIQUIDITY_WINDOW (seconds):[1.5, 2.5, 5.5, 15.5, 30.5, 60.5, 120.5, 300.5]The “.5” is not a typo — it’s a practical trick to avoid edge effects when sampling around exact seconds.
One of my rules: feature names are part of the API.
If I rename a feature, I’m changing the world my model lives in.
So here are the core keys (as implemented in __create_current_snapshot):
Instrument + contract:
indicative_settle_price, mark_price, fair_pricecontract_symbol, contract_expiry_dateQuotes + spread:
bitmex_best_bid, bitmex_best_askbest_bid, best_ask, quotes_diffTrade constraints:
min_trade, max_tradeTop-of-book sizes (5 levels):
bid_size_1..bid_size_5ask_size_1..ask_size_5Depth:
bid_depth_l10, ask_depth_l10bid_depth_l25, ask_depth_l25Quote movement:
bid_change_<seconds>ask_change_<seconds>Distribution / regime:
cdf_<seconds>std_pct_<seconds>Aggressive flow:
buy_volume_<seconds>sell_volume_<seconds>Liquidity creation:
best_bid_created_liq_<seconds>bids_created_liq_<seconds>best_ask_created_liq_<seconds>asks_created_liq_<seconds>Meta:
timestampseconds_last_movemoved_upThis is deliberate.
I want both to match because I’ve seen this failure pattern too many times:
It won’t fail loudly. It will fail by “almost working”.
So I’m treating SnapshotManager like a shared contract, not a helper script.
I don’t start training first.
I start by trying to break my own features.
best_ask > best_bidquotes_diff >= 0Pick one feature family and force a scenario:
buy_volume_*If bid_size_1..5 are large, bid_depth_l10 should usually be larger than any single level.
If it isn’t, I probably messed up indexing or sorting.
I want at least a few timestamps where I can say:
I’m not trying to solve the entire order book in February.
No L3 queue modeling.
No full book images into a CNN.
No “alpha signals” yet.
This month is about a foundation I can trust.
Start with what you can debug.
Only then scale the representation.
My research repo
This series is tied to code. Collector → datasets → features → alpha models → Gym envs → live loop (“Chappie”).
I will — later.
But right now I need an observation space I can debug in plain English.
These feature families are a stepping stone: they encode microstructure explicitly, and they give me invariants I can test.
Once the pipeline is stable, I can try richer representations without losing my ability to reason about failure.
Because markets move at multiple speeds.
If I use only one window, I force the model to treat “burst noise” and “slow drift” as the same phenomenon.
Multiple windows let the model learn multi-scale structure without me hardcoding “trend indicators”.
Silent mismatch.
Feature code that almost matches between collection and live trading is the fastest way to produce beautiful backtests and broken reality.
That’s why SnapshotManager is treated as a contract, and why March is about the collector’s correctness.
Now that I know what the model will see, I need to build the thing that collects it reliably.
Next month is where the market stops being theory and becomes a hostile networked system.
The Collector - Websockets, Clock Drift, and the First Clean Snapshot
In March 2019 I stop “talking about microstructure” and start collecting it. Websockets drop messages, clocks drift, and the only thing that matters is producing a snapshot I can trust.
Order Books Are the Battlefield - Matching Engines in Plain English
In 2018 I learned RL inside clean Gym worlds. In 2019 I’m pointing that mindset at BitMEX — where the “environment” is a matching engine and the rewards come with slippage, fees, queue priority, and outages.