Blog
Jan 27, 2019 - 12 MIN READ
Order Books Are the Battlefield - Matching Engines in Plain English

Order Books Are the Battlefield - Matching Engines in Plain English

In 2018 I learned RL inside clean Gym worlds. In 2019 I’m pointing that mindset at BitMEX — where the “environment” is a matching engine and the rewards come with slippage, fees, queue priority, and outages.

Axel Domingues

Axel Domingues

In 2018 I learned Reinforcement Learning inside clean Gym worlds:

  • the loop matters more than the algorithm
  • your metrics lie unless you instrument them
  • and the fastest way to fool yourself is to “optimize” inside a broken interface

Now I’m taking that mindset somewhere less forgiving.

BitMEX (Bitcoin Mercantile Exchange) isn’t a benchmark. It’s a machine: a matching engine, a queue, a fee model, a risk system, and a firehose of events.

And if you want to build an automated trader, you don’t start with PPO.

You start by understanding the battlefield.

What this post gives you

A plain-English mental model of order books + matching engines (no mystique, just mechanics).

The 4 microstructure facts

Spread, depth, queue priority, and fees: the stuff that turns “good ideas” into bad fills.

Why RL intuition matters here

The market is an environment that reacts. Your actions change what you see next.

Setup for next month

We end by asking: what can a model actually observe from this system?


The Core Claim

An order book is not “data.”

An order book is the mechanism every participant must go through to get filled.

That means:

  • prices move because orders arrive and get matched
  • your fill quality depends on how you interact (maker vs taker, timing, queue position)
  • a strategy that ignores microstructure can look brilliant in backtests and die in production
I’m not giving financial advice here. This series is about research engineering:

interfaces, data, evaluation traps, and the reality of building a system that doesn’t lie to you.


The Order Book (in the smallest honest words)

Think of the order book as two stacks:

  • Bids: people willing to buy (prices below the current midpoint)
  • Asks: people willing to sell (prices above the current midpoint)

The “top” of each stack matters most:

  • Best bid = highest buy price available now
  • Best ask = lowest sell price available now
  • Spread = the gap between them

When people say “liquidity,” the first thing I translate it into is:

How much size is available at the prices I can actually reach?


Two order types that matter more than any algorithm

1) Limit orders (maker behavior)

A limit order says: “I will buy/sell, but only at this price or better.”

  • it rests in the book (if it doesn’t match immediately)
  • if it rests, it can earn maker rebates (exchange-dependent)
  • it gives you price control, but not fill certainty

2) Market orders (taker behavior)

A market order says: “Buy/sell now.”

  • it consumes the book immediately
  • it pays taker fees (exchange-dependent)
  • it guarantees a trade, but not a price
If you only remember one thing:

limit orders buy you price control; market orders buy you certainty.


Matching engines: the part nobody can “opt out” of

A matching engine is just a rulebook for: “who trades with whom, and at what price.”

Most exchanges implement some version of:

  • price priority: better price wins
  • time priority: at the same price, earlier order wins (queue)

That second part is where “HFT vibes” start to appear.

If you join a price level late, you’re at the back of the line.

Even if you’re “right” about the market, you can still get:

  • partial fills
  • no fills
  • fills only after the price moves against you

And that’s before we even talk about outages.


A tiny example that makes it real

Imagine the book looks like this (asks on top, bids below):

SidePriceSize
Ask1012
Ask1023
Bid1001
Bid994

Now place a market buy of size 4:

  • it consumes 2 at 101
  • then consumes 2 of the 3 available at 102
  • you get filled, but your average entry is worse than the best ask

That’s slippage in its simplest form:

Market orders walk the book.

And the deeper the book, the more expensive “now” becomes.


Maker vs taker isn’t “a detail” — it’s behavior shaping

BitMEX (and many venues) distinguish between:

  • maker: add liquidity (resting orders)
  • taker: remove liquidity (marketable orders)

Even without quoting exact fee numbers, the important part is structural:

  • maker can be cheaper (sometimes a rebate)
  • taker is usually more expensive
  • this pushes strategies toward making when possible

That matters for learning systems because fees are not noise. Fees are a consistent reward component.

If you ignore them, you’ll train policies that “win” in a fake world.

Backtests that ignore fees + slippage are not “optimistic.”
They are often invalid.

Queue priority: why “I posted the same price” isn’t the same trade

At a single price level, orders form a queue.

If you place a limit order at the best bid:

  • you might get filled instantly…
  • or you might sit behind a wall of other orders
  • and only get filled if enough sellers come through

This is the first place where my 2018 RL brain kicks in:

Your action isn’t “buy.”
Your action is “choose how to buy” — and that changes your transition dynamics.

In Gym, environments are polite.

In markets, the environment is adversarial by default.


Microstructure signals: the “physics” the model can learn from

I’m going to keep these terms informal for now (we’ll formalize features next month), but these are the ideas:

  • spread: is the market tight or stressed?
  • depth: is there real size, or is it thin?
  • imbalance: is liquidity heavier on bids or asks?
  • flow: are trades mostly hitting bids (selling pressure) or lifting asks (buying pressure)?
  • cancellations: is liquidity real or just appearing then vanishing?
  • bursts: do things happen smoothly, or in violent jumps?

These are not “indicators.”

These are the mechanism expressing itself.


The RL translation (because I can’t unsee it)

In 2018 I learned to obsess over interfaces:

  • observation space
  • action space
  • reward design
  • termination conditions
  • and the ways environments accidentally allow cheating

Markets force the same thinking.

Environment

The market + matching engine + fees + latency + outages.

Observations

Order book snapshots, trades, funding/index context, timing.

Actions

Not “buy/sell” — but “place/cancel/modify, choose price level, choose size, choose patience.”

Reward

Not “price went up.”
Reward is PnL after fees, under realistic fills, with risk constraints.

If I don’t define the trading “environment contract” carefully, RL won’t fail loudly.

It will succeed at the wrong game.


The debugging mindset I’m carrying into 2019

This year isn’t going to be “run algorithms.”

It’s going to be “build a system that tells the truth.”

Here’s the instrumentation I already know I’ll need (even before models):

Data integrity

Sequence gaps, timestamp drift, duplicate events, and “silent missingness.”

Microstructure sanity

Spread distribution, depth distribution, event rates, and burst detection.

Fill realism

Partial fills, queue position assumptions, slippage, and fee accounting.

“Don’t lie” rules

No peeking at future, no perfect fills, no ignoring outages, no fake latency.


Resources (what I’m using as my grounding)

BitMEX Guides (Microstructure + Contracts)

I’m using venue docs as “physics”: what the engine promises, what it doesn’t, and what data it emits.

My research repo

This series is tied to code. Collector → datasets → features → alpha models → Gym envs → live loop (“Chappie”).


FAQ


What’s next

Now that the order book has shape, the next question is unavoidable:

What can the model actually see?

Next month I’ll take the messy reality above and turn it into:

  • a concrete observation design
  • a first feature inventory
  • and the rules that prevent “feature engineering” from becoming future leakage.
Axel Domingues - 2026