Jan 27, 2019 - 12 MIN READ

Order Books Are the Battlefield - Matching Engines in Plain English

In 2018 I learned RL inside clean Gym worlds. In 2019 I’m pointing that mindset at BitMEX — where the “environment” is a matching engine and the rewards come with slippage, fees, queue priority, and outages.

Axel Domingues

In 2018 I learned Reinforcement Learning inside clean Gym worlds:

the loop matters more than the algorithm
your metrics lie unless you instrument them
and the fastest way to fool yourself is to “optimize” inside a broken interface

Now I’m taking that mindset somewhere less forgiving.

BitMEX (Bitcoin Mercantile Exchange) isn’t a benchmark. It’s a machine: a matching engine, a queue, a fee model, a risk system, and a firehose of events.

And if you want to build an automated trader, you don’t start with PPO.

You start by understanding the battlefield.

What this post gives you

A plain-English mental model of order books + matching engines (no mystique, just mechanics).

The 4 microstructure facts

Spread, depth, queue priority, and fees: the stuff that turns “good ideas” into bad fills.

Why RL intuition matters here

The market is an environment that reacts. Your actions change what you see next.

Setup for next month

We end by asking: what can a model actually observe from this system?

The Core Claim

An order book is not “data.”

An order book is the mechanism every participant must go through to get filled.

That means:

prices move because orders arrive and get matched
your fill quality depends on how you interact (maker vs taker, timing, queue position)
a strategy that ignores microstructure can look brilliant in backtests and die in production

I’m not giving financial advice here. This series is about research engineering:

interfaces, data, evaluation traps, and the reality of building a system that doesn’t lie to you.

The Order Book (in the smallest honest words)

Think of the order book as two stacks:

Bids: people willing to buy (prices below the current midpoint)
Asks: people willing to sell (prices above the current midpoint)

The “top” of each stack matters most:

Best bid = highest buy price available now
Best ask = lowest sell price available now
Spread = the gap between them

When people say “liquidity,” the first thing I translate it into is:

How much size is available at the prices I can actually reach?

Two order types that matter more than any algorithm

1) Limit orders (maker behavior)

A limit order says: “I will buy/sell, but only at this price or better.”

it rests in the book (if it doesn’t match immediately)
if it rests, it can earn maker rebates (exchange-dependent)
it gives you price control, but not fill certainty

2) Market orders (taker behavior)

A market order says: “Buy/sell now.”

it consumes the book immediately
it pays taker fees (exchange-dependent)
it guarantees a trade, but not a price

If you only remember one thing:

limit orders buy you price control; market orders buy you certainty.

Matching engines: the part nobody can “opt out” of

A matching engine is just a rulebook for: “who trades with whom, and at what price.”

Most exchanges implement some version of:

price priority: better price wins
time priority: at the same price, earlier order wins (queue)

That second part is where “HFT vibes” start to appear.

If you join a price level late, you’re at the back of the line.

Even if you’re “right” about the market, you can still get:

partial fills
no fills
fills only after the price moves against you

And that’s before we even talk about outages.

A tiny example that makes it real

Imagine the book looks like this (asks on top, bids below):

Side	Price	Size
Ask	101	2
Ask	102	3
Bid	100	1
Bid	99	4

Now place a market buy of size 4:

it consumes 2 at 101
then consumes 2 of the 3 available at 102
you get filled, but your average entry is worse than the best ask

That’s slippage in its simplest form:

Market orders walk the book.

And the deeper the book, the more expensive “now” becomes.

Maker vs taker isn’t “a detail” — it’s behavior shaping

BitMEX (and many venues) distinguish between:

maker: add liquidity (resting orders)
taker: remove liquidity (marketable orders)

Even without quoting exact fee numbers, the important part is structural:

maker can be cheaper (sometimes a rebate)
taker is usually more expensive
this pushes strategies toward making when possible

That matters for learning systems because fees are not noise. Fees are a consistent reward component.

If you ignore them, you’ll train policies that “win” in a fake world.

Backtests that ignore fees + slippage are not “optimistic.”
They are often invalid.

Queue priority: why “I posted the same price” isn’t the same trade

At a single price level, orders form a queue.

If you place a limit order at the best bid:

you might get filled instantly…
or you might sit behind a wall of other orders
and only get filled if enough sellers come through

This is the first place where my 2018 RL brain kicks in:

Your action isn’t “buy.”
Your action is “choose how to buy” — and that changes your transition dynamics.

In Gym, environments are polite.

In markets, the environment is adversarial by default.

Microstructure signals: the “physics” the model can learn from

I’m going to keep these terms informal for now (we’ll formalize features next month), but these are the ideas:

spread: is the market tight or stressed?
depth: is there real size, or is it thin?
imbalance: is liquidity heavier on bids or asks?
flow: are trades mostly hitting bids (selling pressure) or lifting asks (buying pressure)?
cancellations: is liquidity real or just appearing then vanishing?
bursts: do things happen smoothly, or in violent jumps?

These are not “indicators.”

These are the mechanism expressing itself.

The RL translation (because I can’t unsee it)

In 2018 I learned to obsess over interfaces:

observation space
action space
reward design
termination conditions
and the ways environments accidentally allow cheating

Markets force the same thinking.

Environment

The market + matching engine + fees + latency + outages.

Observations

Order book snapshots, trades, funding/index context, timing.

Actions

Not “buy/sell” — but “place/cancel/modify, choose price level, choose size, choose patience.”

Reward

Not “price went up.”
Reward is PnL after fees, under realistic fills, with risk constraints.

If I don’t define the trading “environment contract” carefully, RL won’t fail loudly.

It will succeed at the wrong game.

The debugging mindset I’m carrying into 2019

This year isn’t going to be “run algorithms.”

It’s going to be “build a system that tells the truth.”

Here’s the instrumentation I already know I’ll need (even before models):

Data integrity

Sequence gaps, timestamp drift, duplicate events, and “silent missingness.”

Microstructure sanity

Spread distribution, depth distribution, event rates, and burst detection.

Fill realism

Partial fills, queue position assumptions, slippage, and fee accounting.

“Don’t lie” rules

No peeking at future, no perfect fills, no ignoring outages, no fake latency.

Resources (what I’m using as my grounding)

BitMEX Guides (Microstructure + Contracts)

I’m using venue docs as “physics”: what the engine promises, what it doesn’t, and what data it emits.

My research repo

This series is tied to code. Collector → datasets → features → alpha models → Gym envs → live loop (“Chappie”).

FAQ

What’s next

Now that the order book has shape, the next question is unavoidable:

What can the model actually see?

Next month I’ll take the messy reality above and turn it into:

a concrete observation design
a first feature inventory
and the rules that prevent “feature engineering” from becoming future leakage.

From Microstructure to Features - What the Model Will See

If RL taught me “the state is the contract,” then trading is where that contract becomes painful. This month I map order book microstructure into concrete feature families my models can actually learn from.

Imitation Learning - GAIL and the Strange Feeling of Learning From Experts

I ended 2018 in a weird place - less “reinforcement,” more “copying.” GAIL taught me that sometimes the fastest path to competence is to borrow behavior first — and ask questions later.