Jun 28, 2020 - 16 MIN READ

First Live Runs - Small Size, Big Lessons

Backtests looked amazing. Live PnL didn't. In June 2020 I ran the first real BitMEX live loop at tiny size and learned the most important lesson in trading ML: regime is the boss.

Axel Domingues

By April I had a running process. By May it could fail safely.

So in June I did the obvious next step:

I let it trade.

Not with bravado. Not with leverage fantasies.

With tiny size, kill switches armed, and the kind of humility you only earn after a few “amazing backtests”.

And BitMEX did the thing BitMEX always does:

It taught me something I could not learn in notebooks.

What you'll learn in this post

what I actually mean by "first live run" (and why it starts as observability)
how the Chappie loop connects websockets -> features -> policy -> orders
the simplest set of live metrics that caught failure early
the big one: how a bull-trained model quietly becomes a "long-only" personality
the practical response: evaluation discipline, regime awareness, and constraints that teach

The setup: prove the loop before you prove the alpha

My rule for going live was boring on purpose:

Start in observe-only mode (no orders).
Turn on paper decisions (log what it would do).
Trade at tiny size (so errors are tuition, not bankruptcy).
Promote only when metrics look sane (not when PnL spikes).

This was the point of the work in April/May.

A live trading bot isn't "a model".

It's a process that must keep these promises, every minute:

build the same observation contract used in training
produce an action on time
place/cancel/amend orders idempotently
survive missing data, reconnects, and partial fills

The live loop (the part that actually runs)

In this repo, the live loop is split into two key runtime pieces:

BitmexPythonChappie/BitMEXWebsocketClient.py — the clock and market feed
BitmexPythonChappie/BitMEXBotClient.py — the executor and position truth

The websocket client: the market is a stream, not a file

My websocket client connects, authenticates (when needed), and subscribes to a small set of channels.

In practice, the bot mostly cares about quotes (best bid/ask) as the tick that drives the loop.

# BitMEXWebsocketClient.py
self.ws.send(json.dumps({'op': 'subscribe', 'args': self.channels}))

# channels include (depending on config):
# orderBookL2_25, order, quote, trade, margin

When a quote arrives, the client updates the latest market state and triggers the bot's handler:

# BitMEXWebsocketClient.py
if action == 'quote':
    self.orderbook.update_quote(item[0])
    self.bitmex_bot.on_quote_received(item[0])

That last line is important.

It means the market decides when you're allowed to think.

Not your training loop.

Not your backtest.

The exchange.

The bot client: a state machine that refuses to "assume" fills

BitMEXBotClient.py is where I started treating execution as a real system.

Instead of "send order and hope", the bot keeps explicit states:

# BitMEXBotClient.py
OPENING_STATE = 0
OPEN_STATE = 1
CLOSING_STATE = 2
CLOSED_STATE = 3

SIDE_BUY = 0
SIDE_SELL = 1

Every quote tick, it does a small, deterministic thing:

if a position is opening, check the live order status and amend/cancel if needed
if it's open, decide whether we should close
if it's closing, verify it actually closed

The most important behavior here is reconciliation: don't let the bot believe a whole story because it placed one order.

A trading bot is a reconciliation engine.

The strategy is the part you want to be working on.

But the thing that makes it real is the part that keeps asking:

"What is true right now?"

The shock: backtests were amazing... live was disappointing

This is the uncomfortable part.

My backtesting curves looked like I had discovered fire.

Live results looked like I had discovered gravity.

And the reason wasn't exotic.

It was the oldest failure mode in ML:

I trained (and validated) inside a regime, then expected it to generalize outside it.

The hidden bias: the model learned a "bull personality"

My training data was dominated by a bull run:

Dec 2018 -> Jun 2019: strong upside behavior
validation lived in the same neighborhood

Then reality did what reality does:

Jun 2019 -> Apr 2020: slow decline and mean-reversion pain

In the notebook, the agent looked smart.

In the market, the agent looked like this:

it saw signals
it took trades
it mostly preferred long exposure
it struggled to "believe" downtrends

That wasn't a moral failure.

It was a dataset boundary.

If you only teach a model that "dips get bought" for long enough, it starts to treat that as physics.

What I measured live (the minimal dashboard)

The best decision I made in June was to treat live as an observability project.

Before I cared about PnL, I cared about truth.

Here are the metrics that mattered immediately:

1) Action distribution

% of BUY vs SELL decisions
how often the bot stays idle
how frequently decisions flip (thrash)

If your agent is 80-90% BUY in a mixed regime, you don't have "alpha".

You have personality drift.

2) Position state timeline

Because this baseline system opens the full portfolio when it enters, there is no "position size" to tune.

But there is a lot to observe:

whether a long is currently open
whether a short is currently open
how long the position has been open
how long closing has been attempted

In the Gym environment this shows up explicitly in the observation: feature columns + state vars (long open, short open, estimated reward, open/close time spans).

Live, it shows up as logs and state transitions.

3) Execution health

do we receive quotes continuously?
are orders acknowledged?
are fills arriving?
do we hit "missing order" paths?

This is where BitMEX outages (and partial outages) show up as behavioral constraints.

And it's why the next phases of the project slowly drift toward maker-style behavior.

4) Feature drift snapshots

Once per minute (or per N quotes), I logged:

rolling mean/std of key feature families
normalized feature ranges (to catch bad scaling)
simple sanity checks (NaNs, zeros, frozen values)

When a live feature stream breaks, your policy doesn't "fail".

It keeps acting.

Just on garbage.

The lessons I wrote down (so I wouldn't gaslight myself)

Lesson 1: live is not backtest with latency

Latency matters, but it wasn't the point.

The point was that the market distribution changed.

The model was not wrong.

It was trained on a different world.

Lesson 2: evaluation must be walk-forward by default

This month is where I stopped trusting random validation splits.

If the market can change character month-to-month, then evaluation must be time-aware.

The next post (July) is where this becomes explicit through constraints.

And the months after that are where it becomes a proper batch discipline.

Lesson 3: you need constraints before you need cleverness

The live loop taught me something RL people learn the hard way:

If you don't constrain behavior, the agent will find the easiest consistent thing it can do.

In this case:

"being long" was consistently rewarded in the bull regime
that preference survived deployment

The fix isn't a new network.

It's teaching the agent what bad behavior costs.

That's why March/July 2020 are all about penalties, timeouts, risk caps, and reward shaping that doesn't lie.

What I changed immediately (June hotfix mindset)

I didn't "solve" regime shift in June.

I did the engineer thing:

reduced trade frequency
tightened kill switch thresholds
added explicit "cooldowns" after a close
made the bot more conservative about re-opening in the same direction

And most importantly:

I stopped reading good backtests as proof.

They became hypotheses.

Live became the test.

Resources

bitmex-deeprl-research (GitHub)

The full repo behind this series — data pipeline, feature engineering, Gym env, and the live Chappie wiring.

BitMEXWebsocketClient.py

The live market feed: channel subscriptions, message routing, and the quote-driven tick that powers the runtime loop.

BitMEXBotClient.py

The executor: state machine, reconciliation, and the rules that stop one order from becoming a fantasy.

OpenAI Gym Interface (concept)

The abstraction I used to force a contract between learning and trading — observation, action, reward, done.

What's next

June was "first live".

July is where I start doing the thing live forced me to do:

design constraints that teach.

Constraints That Teach

Constraints That Teach: Risk Caps, Timeouts, and Surviving Bad Regimes

After my first disappointing live runs, I stopped asking my agent to be clever and started forcing it to be safe: risk caps, timeouts, and “market-health” gates that kept the loop alive when the regime wasn’t.

Safety Engineering - Kill Switches, Reconciliation, and Failure Recovery

In May 2020, I stop hoping the bot is “fine” and start giving it explicit failure states — stale websockets, missing fills, rate-limits, and the kill switches that keep a live loop honest.