
In September 2020, I stop trying to be fast and start trying to be executable. The surprising result: in maker-style trading, fees aren’t a footnote — they can be the whole edge.
Axel Domingues
By September 2020 I had two painful truths on my desk:
So I started asking a different question:
What does a policy look like when execution is the constraint?
That question is what pushed me into maker-style behavior.
On BitMEX, the market structure matters:
That sounds like a footnote until you run a high-churn strategy and notice that fees can dominate the entire PnL curve.
This is the exact kind of thing a pure price-only reward misses. If your reward is only "did price go my way", you train agents that:
But if maker rebates exist, the world changes:
Before I tried to teach this to an agent, the real execution loop (“Chappie”) was already telling me what was safe.
In BitmexPythonChappie/BitMEXBotClient.py, orders are placed with BitMEX’s post-only instruction:
execInst='ParticipateDoNotInitiate'That single flag encodes an entire philosophy:
The core idea of this month was simple:
Make the reward reflect what the live system is allowed to do.
If the bot is constrained to maker-style execution, the agent’s reward should learn that same constraint.
Here’s the reward spec I converged toward (v0):
One reason this clicked fast is that parts of my environment code already treated maker/taker as first-class parameters.
For example, in the HFT environment prototype (bitmex-hft-gym/.../bitmex_hft_env.py) the fee terms are explicit:
maker_fee_pct = 0.025taker_fee_pct = -0.075Even if the HFT environment become a failure story, this part was a good artifact: it forced me to write down the fact that execution type changes reward.
If you tell an agent "maker rebates are good" without guardrails, it will try to spin in circles.
So the learning goal became:
In practice, the "maker strategy" started to look like:
That last point matters because it connects directly to the BitMEX reality:
To avoid self-delusion, I added a fee-aware breakdown in the execution reporting.
The key line (conceptually) is:
num_trades * abs(maker_fee_pct) * 2Why * 2?
And then I started getting logs like this:
Short rolled over - prince (8695.500000) timestamp (2019-06-15 00:00:26.172225+00:00)
Long rolled over - prince (8674.000000) timestamp (2019-06-15 00:34:39.171643+00:00)
Long rolled over - prince (8666.000000) timestamp (2019-06-15 00:51:50.171991+00:00)
...
Short rolled over - prince (9047.500000) timestamp (2019-11-10 16:29:55.170885+00:00)
Short rolled over - prince (9012.000000) timestamp (2019-11-10 16:46:24.171960+00:00)
...
Total trades: 2777
Max open count: 10
Max drawdown: -18.157810
Total summed profit precentage: -80.688532
Total fees precentage: 138.850000
Total fees & profit precentage: 58.161468
This was the “oh” moment:
And that wasn’t with perfect execution or magical alpha — it was simply what the microstructure made possible.
Maker behavior is not free money. It’s a different risk profile:
So reward shaping had to teach the agent to prefer:
…and to fear:
That’s also where my “bull personality” paranoia helped: maker rebates are a stabilizer, but they can also hide directional weakness if you don’t measure them separately.
The maker-strategy idea exposed the next missing piece:
At this point, a lot of my setups were still “entry-ish”:
Maker trading forced me to admit that the real problem is position management:
Which is exactly why the next article exists.
Repository — bitmex-deeprl-research
The full project: environments, training loops, and the live “Chappie” execution client.
BitMEXBotClient (post-only execution)
Where the live constraint shows up: execInst='ParticipateDoNotInitiate' (maker-only intent).
It can be, if you ignore fill realism. The point of this month wasn’t "rebates are free money" — it was: rebates are a real term in the reward, so the environment must model when you actually earn them.
Because taker-style strategies often depend on precise timing. Maker-style policies can tolerate delay: you place, wait, and accept that sometimes the best action is to do nothing.
I stopped treating costs as bookkeeping and started treating them as behavioral incentives. Reward shaping isn’t just about getting learning to converge — it’s about teaching the agent what reality will actually pay for.
bitmex-management-gym: Position Sizing
Maker rebates made the strategy survivable.
Management is what makes it responsible.
bitmex-management-gym: Position Sizing and the First Risk-Aware Agent
After months of "all-in" agents with bull personalities, I rebuilt the environment to teach risk: stackable positions, time-awareness, and penalties that prevent reward-hacking.
Deep Silos in RL: Architecture as Stability (and the First LSTM Variant)
August 2020 - After the first live pain and the bull-personality problem, I stopped tuning "algorithms" and started tuning the network contract. Deep Silos beat flat MLPs, and the LSTM variant overfit fast.