
In 2018 I learned RL inside clean Gym worlds. In 2019 I’m pointing that mindset at BitMEX — where the “environment” is a matching engine and the rewards come with slippage, fees, queue priority, and outages.
Axel Domingues
In 2018 I learned Reinforcement Learning inside clean Gym worlds:
Now I’m taking that mindset somewhere less forgiving.
BitMEX (Bitcoin Mercantile Exchange) isn’t a benchmark. It’s a machine: a matching engine, a queue, a fee model, a risk system, and a firehose of events.
And if you want to build an automated trader, you don’t start with PPO.
You start by understanding the battlefield.
What this post gives you
A plain-English mental model of order books + matching engines (no mystique, just mechanics).
The 4 microstructure facts
Spread, depth, queue priority, and fees: the stuff that turns “good ideas” into bad fills.
Why RL intuition matters here
The market is an environment that reacts. Your actions change what you see next.
Setup for next month
We end by asking: what can a model actually observe from this system?
An order book is not “data.”
An order book is the mechanism every participant must go through to get filled.
That means:
interfaces, data, evaluation traps, and the reality of building a system that doesn’t lie to you.
Think of the order book as two stacks:
The “top” of each stack matters most:
When people say “liquidity,” the first thing I translate it into is:
How much size is available at the prices I can actually reach?
A limit order says: “I will buy/sell, but only at this price or better.”
A market order says: “Buy/sell now.”
limit orders buy you price control; market orders buy you certainty.
A matching engine is just a rulebook for: “who trades with whom, and at what price.”
Most exchanges implement some version of:
That second part is where “HFT vibes” start to appear.
If you join a price level late, you’re at the back of the line.
Even if you’re “right” about the market, you can still get:
And that’s before we even talk about outages.
Imagine the book looks like this (asks on top, bids below):
| Side | Price | Size |
|---|---|---|
| Ask | 101 | 2 |
| Ask | 102 | 3 |
| Bid | 100 | 1 |
| Bid | 99 | 4 |
Now place a market buy of size 4:
That’s slippage in its simplest form:
Market orders walk the book.
And the deeper the book, the more expensive “now” becomes.
BitMEX (and many venues) distinguish between:
Even without quoting exact fee numbers, the important part is structural:
That matters for learning systems because fees are not noise. Fees are a consistent reward component.
If you ignore them, you’ll train policies that “win” in a fake world.
At a single price level, orders form a queue.
If you place a limit order at the best bid:
This is the first place where my 2018 RL brain kicks in:
Your action isn’t “buy.”
Your action is “choose how to buy” — and that changes your transition dynamics.
In Gym, environments are polite.
In markets, the environment is adversarial by default.
I’m going to keep these terms informal for now (we’ll formalize features next month), but these are the ideas:
These are not “indicators.”
These are the mechanism expressing itself.
In 2018 I learned to obsess over interfaces:
Markets force the same thinking.
The market + matching engine + fees + latency + outages.
Order book snapshots, trades, funding/index context, timing.
Not “buy/sell” — but “place/cancel/modify, choose price level, choose size, choose patience.”
Not “price went up.”
Reward is PnL after fees, under realistic fills, with risk constraints.
It will succeed at the wrong game.
This year isn’t going to be “run algorithms.”
It’s going to be “build a system that tells the truth.”
Here’s the instrumentation I already know I’ll need (even before models):
Data integrity
Sequence gaps, timestamp drift, duplicate events, and “silent missingness.”
Microstructure sanity
Spread distribution, depth distribution, event rates, and burst detection.
Fill realism
Partial fills, queue position assumptions, slippage, and fee accounting.
“Don’t lie” rules
No peeking at future, no perfect fills, no ignoring outages, no fake latency.
People often say “order book” and mean aggregated depth (Level 2). That’s useful, but it hides queue detail. Some venues expose richer detail (sometimes called Level 3 / full depth), but what you can get depends on the API and the venue. For this project, I’m going to be explicit each time about what I actually have in the data stream.
Because “price” is the output of the matching engine. If I skip the mechanism, I’ll build predictors that look good but don’t survive slippage, fees, and fill uncertainty. I want the system to be honest before it is clever.
Now that the order book has shape, the next question is unavoidable:
What can the model actually see?
Next month I’ll take the messy reality above and turn it into:
From Microstructure to Features - What the Model Will See
If RL taught me “the state is the contract,” then trading is where that contract becomes painful. This month I map order book microstructure into concrete feature families my models can actually learn from.
Imitation Learning - GAIL and the Strange Feeling of Learning From Experts
I ended 2018 in a weird place - less “reinforcement,” more “copying.” GAIL taught me that sometimes the fastest path to competence is to borrow behavior first — and ask questions later.