Blog
Sep 27, 2020 - 18 MIN READ
Maker Trades as a Strategy: When Fees Become a Reward Signal

Maker Trades as a Strategy: When Fees Become a Reward Signal

In September 2020, I stop trying to be fast and start trying to be executable. The surprising result: in maker-style trading, fees aren’t a footnote — they can be the whole edge.

Axel Domingues

Axel Domingues

By September 2020 I had two painful truths on my desk:

  1. My agents were getting a "bull personality" because both training and validation were biased toward a bull regime. The backtests were impressive and also… not honest.
  2. Outages were real. If BitMEX gives you a 503 in the exact minute you "need" to trade, your strategy is not a strategy — it’s a fragile assumption.

So I started asking a different question:

What does a policy look like when execution is the constraint?

That question is what pushed me into maker-style behavior.

Context: We just finished the RL exploration marathon in 2018 (algorithms, stability, exploration). 2019 turned into "trading is a system". 2020 is where I finally admit the truth: a prediction is not a decision, and a decision that can’t be executed is just a story.

Maker vs taker: the hidden reward signal

On BitMEX, the market structure matters:

  • Taker trades pay fees.
  • Maker trades can receive rebates.

That sounds like a footnote until you run a high-churn strategy and notice that fees can dominate the entire PnL curve.

This is the exact kind of thing a pure price-only reward misses. If your reward is only "did price go my way", you train agents that:

  • chase momentum,
  • flip constantly,
  • and fall apart when the exchange is stressed.

But if maker rebates exist, the world changes:

  • placing good liquidity becomes a skill,
  • the spread becomes a micro-reward,
  • and "do nothing" becomes a valid action when the tape is toxic.

The bot already knew the rule: post-only or die

Before I tried to teach this to an agent, the real execution loop (“Chappie”) was already telling me what was safe.

In BitmexPythonChappie/BitMEXBotClient.py, orders are placed with BitMEX’s post-only instruction:

  • execInst='ParticipateDoNotInitiate'

That single flag encodes an entire philosophy:

  • If the order would cross the spread and take liquidity, don’t place it.
  • If you can’t be maker, be patient.
This was my first big mindset flip: the "smart" action is often refusing to trade when the only available fill is a taker fill.

Turning fees into learning: the maker-shaped reward

The core idea of this month was simple:

Make the reward reflect what the live system is allowed to do.

If the bot is constrained to maker-style execution, the agent’s reward should learn that same constraint.

Here’s the reward spec I converged toward (v0):

Reward components (v0)

  • PnL term (directional)
    • Reward for favorable price movement while holding.
    • Penalize adverse movement (but not so hard that the agent panic-flips).
  • Fee / rebate term (execution-aware)
    • Maker rebate is a positive add-on when a trade is executed as maker.
    • Taker fee is a negative add-on when execution slips into taker.
A maker rebate is a real signal, but it’s also an easy thing to "reward hack" in a simulator if you pretend fills are guaranteed. The reward only stays honest if the environment models fill risk (or at least refuses to give free fills).

The environment hint: fees were already first-class

One reason this clicked fast is that parts of my environment code already treated maker/taker as first-class parameters.

For example, in the HFT environment prototype (bitmex-hft-gym/.../bitmex_hft_env.py) the fee terms are explicit:

  • maker_fee_pct = 0.025
  • taker_fee_pct = -0.075

Even if the HFT environment become a failure story, this part was a good artifact: it forced me to write down the fact that execution type changes reward.


Maker behavior is not "more trades" — it’s better trades

If you tell an agent "maker rebates are good" without guardrails, it will try to spin in circles.

So the learning goal became:

  • low churn
  • high-quality placements
  • survive outages and bad fills

In practice, the "maker strategy" started to look like:

  • place a limit near the bid/ask,
  • wait,
  • roll only when necessary,
  • avoid urgent market-taking entirely.

That last point matters because it connects directly to the BitMEX reality:

  • a taker strategy needs timing,
  • timing fails under 503s,
  • maker behavior tolerates delay because it’s designed around patience.

Fee-aware reporting: the moment I believed it

To avoid self-delusion, I added a fee-aware breakdown in the execution reporting.

The key line (conceptually) is:

  • total fees contribution = num_trades * abs(maker_fee_pct) * 2

Why * 2?

  • one maker rebate on entry,
  • one maker rebate on exit.

And then I started getting logs like this:

Short rolled over - prince (8695.500000) timestamp (2019-06-15 00:00:26.172225+00:00)
Long rolled over - prince (8674.000000) timestamp (2019-06-15 00:34:39.171643+00:00)
Long rolled over - prince (8666.000000) timestamp (2019-06-15 00:51:50.171991+00:00)
...
Short rolled over - prince (9047.500000) timestamp (2019-11-10 16:29:55.170885+00:00)
Short rolled over - prince (9012.000000) timestamp (2019-11-10 16:46:24.171960+00:00)
...
Total trades: 2777
Max open count: 10
Max drawdown: -18.157810
Total summed profit precentage: -80.688532
Total fees precentage: 138.850000
Total fees & profit precentage: 58.161468

This was the “oh” moment:

  • purely on trade PnL, the strategy lost ~80% over the slow bear drift,
  • but on maker rebates it generated ~138%,
  • netting ~58% total.

And that wasn’t with perfect execution or magical alpha — it was simply what the microstructure made possible.

This is why I started treating fees as reward shaping instead of "accounting". If the agent can’t see the fee signal, it can’t learn the behavior.

Spread capture vs adverse selection: the real maker tradeoff

Maker behavior is not free money. It’s a different risk profile:

  • Spread capture: get paid for providing liquidity.
  • Adverse selection: get filled right before price moves against you.

So reward shaping had to teach the agent to prefer:

  • calm moments,
  • good placement,
  • low frequency,

…and to fear:

  • placing into spikes,
  • constantly rolling orders,
  • getting trapped holding inventory in a one-way tape.

That’s also where my “bull personality” paranoia helped: maker rebates are a stabilizer, but they can also hide directional weakness if you don’t measure them separately.


Why this mattered for the next step: management-style environments

The maker-strategy idea exposed the next missing piece:

  • Inventory matters.

At this point, a lot of my setups were still “entry-ish”:

  • relatively coarse decisions,
  • fixed sizing assumptions,
  • and a focus on “should I be long or short”.

Maker trading forced me to admit that the real problem is position management:

  • How long do I keep inventory?
  • When do I reduce risk?
  • How do I avoid churn while still adapting?

Which is exactly why the next article exists.


Repo pointers

Repository — bitmex-deeprl-research

The full project: environments, training loops, and the live “Chappie” execution client.

BitMEXBotClient (post-only execution)

Where the live constraint shows up: execInst='ParticipateDoNotInitiate' (maker-only intent).

BitMEXWebsocketClient (state + reconciliation)

The live feed, position state, and the part that makes you care about reality instead of backtests.

bitmex_hft_env (fee terms as code)

A learning artifact: explicit maker/taker fee parameters that later shaped reward design.


FAQ


What’s next

bitmex-management-gym: Position Sizing

Maker rebates made the strategy survivable.

Management is what makes it responsible.

Axel Domingues - 2026