
Backtests looked amazing. Live PnL didn't. In June 2020 I ran the first real BitMEX live loop at tiny size and learned the most important lesson in trading ML: regime is the boss.
Axel Domingues
By April I had a running process. By May it could fail safely.
So in June I did the obvious next step:
I let it trade.
Not with bravado. Not with leverage fantasies.
With tiny size, kill switches armed, and the kind of humility you only earn after a few “amazing backtests”.
And BitMEX did the thing BitMEX always does:
It taught me something I could not learn in notebooks.
My rule for going live was boring on purpose:
This was the point of the work in April/May.
A live trading bot isn't "a model".
It's a process that must keep these promises, every minute:
In this repo, the live loop is split into two key runtime pieces:
BitmexPythonChappie/BitMEXWebsocketClient.py — the clock and market feedBitmexPythonChappie/BitMEXBotClient.py — the executor and position truthMy websocket client connects, authenticates (when needed), and subscribes to a small set of channels.
In practice, the bot mostly cares about quotes (best bid/ask) as the tick that drives the loop.
# BitMEXWebsocketClient.py
self.ws.send(json.dumps({'op': 'subscribe', 'args': self.channels}))
# channels include (depending on config):
# orderBookL2_25, order, quote, trade, margin
When a quote arrives, the client updates the latest market state and triggers the bot's handler:
# BitMEXWebsocketClient.py
if action == 'quote':
self.orderbook.update_quote(item[0])
self.bitmex_bot.on_quote_received(item[0])
That last line is important.
It means the market decides when you're allowed to think.
Not your training loop.
Not your backtest.
The exchange.
BitMEXBotClient.py is where I started treating execution as a real system.
Instead of "send order and hope", the bot keeps explicit states:
# BitMEXBotClient.py
OPENING_STATE = 0
OPEN_STATE = 1
CLOSING_STATE = 2
CLOSED_STATE = 3
SIDE_BUY = 0
SIDE_SELL = 1
Every quote tick, it does a small, deterministic thing:
The most important behavior here is reconciliation: don't let the bot believe a whole story because it placed one order.
But the thing that makes it real is the part that keeps asking:The strategy is the part you want to be working on.
"What is true right now?"
This is the uncomfortable part.
My backtesting curves looked like I had discovered fire.
Live results looked like I had discovered gravity.
And the reason wasn't exotic.
It was the oldest failure mode in ML:
I trained (and validated) inside a regime, then expected it to generalize outside it.
My training data was dominated by a bull run:
Then reality did what reality does:
In the notebook, the agent looked smart.
In the market, the agent looked like this:
That wasn't a moral failure.
It was a dataset boundary.
If you only teach a model that "dips get bought" for long enough, it starts to treat that as physics.
The best decision I made in June was to treat live as an observability project.
Before I cared about PnL, I cared about truth.
Here are the metrics that mattered immediately:
If your agent is 80-90% BUY in a mixed regime, you don't have "alpha".
You have personality drift.
Because this baseline system opens the full portfolio when it enters, there is no "position size" to tune.
But there is a lot to observe:
In the Gym environment this shows up explicitly in the observation: feature columns + state vars (long open, short open, estimated reward, open/close time spans).
Live, it shows up as logs and state transitions.
This is where BitMEX outages (and partial outages) show up as behavioral constraints.
And it's why the next phases of the project slowly drift toward maker-style behavior.
Once per minute (or per N quotes), I logged:
Just on garbage.It keeps acting.
Latency matters, but it wasn't the point.
The point was that the market distribution changed.
The model was not wrong.
It was trained on a different world.
This month is where I stopped trusting random validation splits.
If the market can change character month-to-month, then evaluation must be time-aware.
The next post (July) is where this becomes explicit through constraints.
And the months after that are where it becomes a proper batch discipline.
The live loop taught me something RL people learn the hard way:
If you don't constrain behavior, the agent will find the easiest consistent thing it can do.
In this case:
The fix isn't a new network.
It's teaching the agent what bad behavior costs.
That's why March/July 2020 are all about penalties, timeouts, risk caps, and reward shaping that doesn't lie.
I didn't "solve" regime shift in June.
I did the engineer thing:
And most importantly:
I stopped reading good backtests as proof.
They became hypotheses.
Live became the test.
bitmex-deeprl-research (GitHub)
The full repo behind this series — data pipeline, feature engineering, Gym env, and the live Chappie wiring.
BitMEXWebsocketClient.py
The live market feed: channel subscriptions, message routing, and the quote-driven tick that powers the runtime loop.
June was "first live".
July is where I start doing the thing live forced me to do:
design constraints that teach.
Next post:
Constraints That Teach
Constraints That Teach: Risk Caps, Timeouts, and Surviving Bad Regimes
After my first disappointing live runs, I stopped asking my agent to be clever and started forcing it to be safe: risk caps, timeouts, and “market-health” gates that kept the loop alive when the regime wasn’t.
Safety Engineering - Kill Switches, Reconciliation, and Failure Recovery
In May 2020, I stop hoping the bot is “fine” and start giving it explicit failure states — stale websockets, missing fills, rate-limits, and the kill switches that keep a live loop honest.