May 26, 2019 - 16 MIN READ

Feature Engineering, But Make It Microstructure: Liquidity Created/Removed

If the order book is the battlefield, features are the sensors. This month I stop hand-waving and teach my pipeline to measure liquidity being added and removed - in a way I can deploy live.

Axel Domingues

In January I convinced myself that order books are the battlefield.

In February I started translating microstructure into the question that actually matters for ML:

What will the model see?

In March I built a collector that could survive websockets, reconnects, and clock drift.

In April I learned the uncomfortable truth: datasets are not neutral. The schema, missingness rules, and integrity checks decide what your model is allowed to learn.

Now it is May, and it is time to do the thing everyone hand-waves:

feature engineering - but with microstructure discipline.

Not "RSI" and "MACD". Not vibes.

I want signals that match the mechanics: liquidity being created and removed, because that is what every participant is doing, every second.

This series is research + engineering documentation, not financial advice. I am building systems, measuring behavior, and learning where assumptions break - not recommending trades.

The core feature this month

Liquidity created/removed: net depth change at best bid/ask + near levels, aggregated over time windows.

The key mindset shift

Features are sensors, not indicators. If I cannot explain what market action produces a feature, I do not trust it.

The anti-leakage rule

Every feature must be computable from past + present messages only. No future bars, no "oops I used the close".

Repo focus

Feature computation lives in SnapshotManager (collector + live bot):

Collector/SnapshotManager.py
Chappie/SnapshotManager.py

The microstructure view of signal

At the order-book level, signal is not mystical.

It is mostly three things:

Liquidity gets posted (limit orders appear)
Liquidity gets removed (cancels and market orders)
The queue decides who actually gets filled (time/price priority)

A lot of ML trading features ignore this and operate on price series as if price is the primary object.

But on BitMEX, price is an output of the matching engine.

The book is the input.

So I want features that track how the book changes.

Terminology

Maker: an order that adds liquidity to the book (usually a limit order).
Taker: an order that removes liquidity immediately (market order or aggressive limit order).
Spread: best ask minus best bid.
Depth: the amount of size available at price levels near the top of book.

Why liquidity created/removed is the first feature family I trust

The simplest story:

When buyers are eager, they remove ask liquidity.
When sellers are eager, they remove bid liquidity.
When participants are positioning without urgency, they add liquidity and wait.

So instead of asking "did price go up?", I can ask:

"Is the market currently adding liquidity near the top... or taking it away?"

That question is closer to the mechanics of the matching engine.

It also gives me a handle on regime:

high activity vs low activity
tight spread vs wide spread
book stacking vs book vacuuming

And importantly, it is a feature I can compute purely from websocket order book updates, which means it can exist in both:

offline dataset creation (collector), and
online inference (bot)

That symmetry becomes a theme later.

My definition: created vs removed is net depth change

I am not trying to perfectly classify each book update as cancel vs add vs fill.

I am doing something simpler and more robust:

Track the net change in size at book levels between two snapshots.
Aggregate those deltas into time windows.

If the net change is positive, liquidity was created (more size got posted than removed). If it is negative, liquidity was removed (more size got consumed/cancelled than added).

This matters because it is a feature that survives a messy world:

partial updates
bursts
reconnects
missing messages (as long as the snapshot you compute is consistent)

The trick is to pick a definition that is:

causal (uses only past/present)
stable under noise
cheap enough to compute live
compatible between collector and live bot

How the repo computes it (SnapshotManager)

The feature logic lives in two parallel implementations:

BitmexPythonDataCollector/SnapshotManager.py (offline dataset pipeline)
BitmexPythonChappie/SnapshotManager.py (live bot feature stream)

They share the same idea: build a snapshot, then compute deltas.

Step 1: decide how much of the book matters

In both SnapshotManagers, I intentionally limit the depth I care about.

In the current repo snapshot, the constant is:

RELEVANT_NUM_QUOTES = 5

Meaning: compute near-top-of-book signals using the best 5 levels per side.

That is an engineering tradeoff:

deeper book gives more context
but increases noise and compute
and can be misleading if deeper levels are frequently spoofed or stale

This is not a forever decision - it is a starting contract.

Step 2: compute delta-size for bid/ask side

Inside SnapshotManager.__determine_liquidity_creation(...), I compare the new snapshot sizes to the previous snapshot sizes and compute:

best bid size delta
best ask size delta
aggregate delta for other bid levels
aggregate delta for other ask levels

Conceptually:

delta_best_bid = new_best_bid_size - prev_best_bid_size
delta_rest_bids = sum(new_bid_sizes[1:K]) - sum(prev_bid_sizes[1:K])

delta_best_ask = new_best_ask_size - prev_best_ask_size
delta_rest_asks = sum(new_ask_sizes[1:K]) - sum(prev_ask_sizes[1:K])

Then those deltas get appended to rolling arrays alongside their timestamps:

best_bid_liquidity_array, best_bid_liquidity_ts_array
bids_liquidity_array, bids_liquidity_ts_array
best_ask_liquidity_array, best_ask_liquidity_ts_array
asks_liquidity_array, asks_liquidity_ts_array

And the arrays are trimmed to a max horizon:

MAX_LIQUIDITY_TIME = 10 minutes

That trimming is not nice to have. It prevents memory blowups during long runs.

Why separate best from rest? Because the top level behaves differently:

it gets hit and replenished constantly
it reflects immediate pressure
it is the first place you see a regime shift

The next few levels matter too, but they are a different timescale.

Step 3: turn deltas into features using time windows

A single delta is noisy. A bursty market produces jitter.

So I do not feed last delta to a model. I aggregate deltas over windows.

In the repo snapshot, the window list is explicit:

RELEVANT_CREATED_LIQUIDITY_WINDOW = [1.5, 3, 6, 12, 24, 48, 96, 240, 480, 960] seconds

That gives me multi-scale vision:

1.5-12s: what just happened
24-96s: are we in a burst
240-960s: slow regime drift

In SnapshotManager.__create_current_snapshot(...), each window produces four features:

best_bid_created_liq_{window}
bids_created_liq_{window}
best_ask_created_liq_{window}
asks_created_liq_{window}

Each one is simply:

sum of deltas since current_ts - window

Which is a clean engineering contract.

Created liquidity

Positive net delta: more size got posted than removed in that window.

Removed liquidity

Negative net delta: more size got consumed/cancelled than posted in that window.

Important: created liquidity is not volume

Liquidity deltas are about book depth.

Executed volume is a different object.

That is why SnapshotManager also tracks traded volume arrays and exposes features like:

buy_volume_{window}
sell_volume_{window}

(Those are derived from trade messages, not book deltas.)

If I mix these concepts, I will misread the market.

A market can:

remove liquidity (negative deltas) without huge volume (cancels)
show large volume without much net delta (replenishment)
alternate rapidly (burst dynamics)

So I keep them separate on purpose.

My do-not-lie checks (feature edition)

April taught me that datasets lie unless you force them to tell the truth.

May's version is:

features lie unless you force them to stay causal, stable, and consistent.

Here is the checklist I use while building and validating these liquidity features.

Confirm timestamps are monotonic (or treat violations as data corruption)

If timestamp goes backward, the time-window sums become meaningless.

Verify deltas behave at reconnect boundaries

A reconnect can cause sudden jumps. I want those flagged, not silently fed to the model.

Sanity-check sign: do removals show up during aggressive moves?

When price spikes up, I expect ask liquidity to be removed more often than created (negative ask deltas).

Plot distributions (even crude histograms) for each window

If the 1.5s window has the same distribution as the 960s window, something is wrong in the indexing logic.

Correlate deltas with spread + mid moves (qualitatively)

I am not looking for predictive proof yet - just basic alignment with mechanics.

Keep collector and bot in sync

If the live bot computes a different feature set than the collector, I am training on fiction.

The most dangerous failure mode is silent desync:

collector feature logic drifts
bot feature logic stays old
you backtest on one reality and trade on another

This is exactly the kind of mistake that makes good backtests worthless.

A small glossary I keep beside my keyboard

What changed in my thinking this month

In classic ML, features felt like inputs.

In microstructure, features feel like instrumentation.

I am not extracting alpha yet - I am building a sensor suite that measures the actual forces inside the matching engine:

who's adding liquidity
who's removing it
at what timescale
and how that relates to spread and price movement

It is the same mindset I learned in RL in 2018:

if you cannot instrument it, you cannot debug it - and if you cannot debug it, you cannot trust it.

Repo references (where this lives)

My research repo

This series is tied to code. Collector → datasets → features → alpha models → Gym envs → live loop (“Chappie”).

What's next

Once I have a feature pipeline, a new problem shows up immediately:

Normalization is not a training detail

If I compute means/standard deviations on one dataset and trade on another regime, the feature distribution shifts and the model can behave like it lost its senses.

Next month is about:

mean/sigma as a versioned artifact
index diff style normalization issues
why consistent scaling is part of the system contract, not just preprocessing

Normalization Is a Deployment Problem - Mean/Sigma and Index Diff

In June 2019 I stop treating feature scaling as “preprocessing” and start treating it as part of the production contract - same transforms, same stats, same order — or the live system lies.

Dataset Reality — HDF5 Schema, Missing Data, and “Don’t Lie to Yourself” Rules

In April 2019 I learned that the hardest part of trading ML isn’t the model — it’s the dataset contract. This month is about HDF5, integrity checks, and building rules that stop “good backtests” from lying.