Sep 27, 2020 - 18 MIN READ

Maker Trades as a Strategy: When Fees Become a Reward Signal

In September 2020, I stop trying to be fast and start trying to be executable. The surprising result: in maker-style trading, fees aren’t a footnote — they can be the whole edge.

Axel Domingues

By September 2020 I had two painful truths on my desk:

My agents were getting a "bull personality" because both training and validation were biased toward a bull regime. The backtests were impressive and also… not honest.
Outages were real. If BitMEX gives you a 503 in the exact minute you "need" to trade, your strategy is not a strategy — it’s a fragile assumption.

So I started asking a different question:

What does a policy look like when execution is the constraint?

That question is what pushed me into maker-style behavior.

Context: We just finished the RL exploration marathon in 2018 (algorithms, stability, exploration). 2019 turned into "trading is a system". 2020 is where I finally admit the truth: a prediction is not a decision, and a decision that can’t be executed is just a story.

Maker vs taker: the hidden reward signal

On BitMEX, the market structure matters:

Taker trades pay fees.
Maker trades can receive rebates.

That sounds like a footnote until you run a high-churn strategy and notice that fees can dominate the entire PnL curve.

This is the exact kind of thing a pure price-only reward misses. If your reward is only "did price go my way", you train agents that:

chase momentum,
flip constantly,
and fall apart when the exchange is stressed.

But if maker rebates exist, the world changes:

placing good liquidity becomes a skill,
the spread becomes a micro-reward,
and "do nothing" becomes a valid action when the tape is toxic.

The bot already knew the rule: post-only or die

Before I tried to teach this to an agent, the real execution loop (“Chappie”) was already telling me what was safe.

In BitmexPythonChappie/BitMEXBotClient.py, orders are placed with BitMEX’s post-only instruction:

execInst='ParticipateDoNotInitiate'

That single flag encodes an entire philosophy:

If the order would cross the spread and take liquidity, don’t place it.
If you can’t be maker, be patient.

This was my first big mindset flip: the "smart" action is often refusing to trade when the only available fill is a taker fill.

Turning fees into learning: the maker-shaped reward

The core idea of this month was simple:

Make the reward reflect what the live system is allowed to do.

If the bot is constrained to maker-style execution, the agent’s reward should learn that same constraint.

Here’s the reward spec I converged toward (v0):

Reward components (v0)

PnL term (directional)
- Reward for favorable price movement while holding.
- Penalize adverse movement (but not so hard that the agent panic-flips).
Fee / rebate term (execution-aware)
- Maker rebate is a positive add-on when a trade is executed as maker.
- Taker fee is a negative add-on when execution slips into taker.

A maker rebate is a real signal, but it’s also an easy thing to "reward hack" in a simulator if you pretend fills are guaranteed. The reward only stays honest if the environment models fill risk (or at least refuses to give free fills).

The environment hint: fees were already first-class

One reason this clicked fast is that parts of my environment code already treated maker/taker as first-class parameters.

For example, in the HFT environment prototype (bitmex-hft-gym/.../bitmex_hft_env.py) the fee terms are explicit:

maker_fee_pct = 0.025
taker_fee_pct = -0.075

Even if the HFT environment become a failure story, this part was a good artifact: it forced me to write down the fact that execution type changes reward.

Maker behavior is not "more trades" — it’s better trades

If you tell an agent "maker rebates are good" without guardrails, it will try to spin in circles.

So the learning goal became:

low churn
high-quality placements
survive outages and bad fills

In practice, the "maker strategy" started to look like:

place a limit near the bid/ask,
wait,
roll only when necessary,
avoid urgent market-taking entirely.

That last point matters because it connects directly to the BitMEX reality:

a taker strategy needs timing,
timing fails under 503s,
maker behavior tolerates delay because it’s designed around patience.

Fee-aware reporting: the moment I believed it

To avoid self-delusion, I added a fee-aware breakdown in the execution reporting.

The key line (conceptually) is:

total fees contribution = num_trades * abs(maker_fee_pct) * 2

Why * 2?

one maker rebate on entry,
one maker rebate on exit.

And then I started getting logs like this:

Short rolled over - prince (8695.500000) timestamp (2019-06-15 00:00:26.172225+00:00)
Long rolled over - prince (8674.000000) timestamp (2019-06-15 00:34:39.171643+00:00)
Long rolled over - prince (8666.000000) timestamp (2019-06-15 00:51:50.171991+00:00)
...
Short rolled over - prince (9047.500000) timestamp (2019-11-10 16:29:55.170885+00:00)
Short rolled over - prince (9012.000000) timestamp (2019-11-10 16:46:24.171960+00:00)
...
Total trades: 2777
Max open count: 10
Max drawdown: -18.157810
Total summed profit precentage: -80.688532
Total fees precentage: 138.850000
Total fees & profit precentage: 58.161468

This was the “oh” moment:

purely on trade PnL, the strategy lost ~80% over the slow bear drift,
but on maker rebates it generated ~138%,
netting ~58% total.

And that wasn’t with perfect execution or magical alpha — it was simply what the microstructure made possible.

This is why I started treating fees as reward shaping instead of "accounting". If the agent can’t see the fee signal, it can’t learn the behavior.

Spread capture vs adverse selection: the real maker tradeoff

Maker behavior is not free money. It’s a different risk profile:

Spread capture: get paid for providing liquidity.
Adverse selection: get filled right before price moves against you.

So reward shaping had to teach the agent to prefer:

calm moments,
good placement,
low frequency,

…and to fear:

placing into spikes,
constantly rolling orders,
getting trapped holding inventory in a one-way tape.

That’s also where my “bull personality” paranoia helped: maker rebates are a stabilizer, but they can also hide directional weakness if you don’t measure them separately.

Why this mattered for the next step: management-style environments

The maker-strategy idea exposed the next missing piece:

Inventory matters.

At this point, a lot of my setups were still “entry-ish”:

relatively coarse decisions,
fixed sizing assumptions,
and a focus on “should I be long or short”.

Maker trading forced me to admit that the real problem is position management:

How long do I keep inventory?
When do I reduce risk?
How do I avoid churn while still adapting?

Which is exactly why the next article exists.

Repo pointers

Repository — bitmex-deeprl-research

The full project: environments, training loops, and the live “Chappie” execution client.

BitMEXBotClient (post-only execution)

Where the live constraint shows up: execInst='ParticipateDoNotInitiate' (maker-only intent).

BitMEXWebsocketClient (state + reconciliation)

The live feed, position state, and the part that makes you care about reality instead of backtests.

bitmex_hft_env (fee terms as code)

A learning artifact: explicit maker/taker fee parameters that later shaped reward design.

FAQ

What’s next

bitmex-management-gym: Position Sizing

Maker rebates made the strategy survivable.

Management is what makes it responsible.

bitmex-management-gym: Position Sizing and the First Risk-Aware Agent

After months of "all-in" agents with bull personalities, I rebuilt the environment to teach risk: stackable positions, time-awareness, and penalties that prevent reward-hacking.

Deep Silos in RL: Architecture as Stability (and the First LSTM Variant)

August 2020 - After the first live pain and the bull-personality problem, I stopped tuning "algorithms" and started tuning the network contract. Deep Silos beat flat MLPs, and the LSTM variant overfit fast.