Writing & Research

Short essays and long-form series on operable AI agents, evaluation, governance, and production reliability.
Reference Architecture v2: the Operable Agent Platform

Reference Architecture v2: the Operable Agent Platform

This is the 2025 finale: a practical reference architecture for running fleets of agents with governance—connectors you can trust, traces you can debug, evals you can ship, and humans you can hand off to.
The Connector Ecosystem: MCP adoption patterns, versioning, and governance

The Connector Ecosystem: MCP adoption patterns, versioning, and governance

Once agents can call tools, connectors become the new platform surface. This month is a playbook for adopting MCP at scale: patterns that work, versioning that doesn’t break customers, and governance that keeps the ecosystem sane.
Voice Agents You Can Operate: reliability, caching, latency, and human handoff

Voice Agents You Can Operate: reliability, caching, latency, and human handoff

Voice turns LLMs into real-time systems. This month is about building voice agents that meet latency budgets, degrade safely, and hand off to humans without losing context—or trust.
GPAI Obligations Begin: What Changes for Model Providers and Enterprises

GPAI Obligations Begin: What Changes for Model Providers and Enterprises

The EU AI Act turns “model choice” into a regulated interface. This month is a practical playbook: what GPAI providers must ship, what enterprises must demand, and how to build compliance into your agent platform without slowing delivery.
Security for Agent Connectors: least privilege, injection resistance, and safe toolchains

Security for Agent Connectors: least privilege, injection resistance, and safe toolchains

In 2025, the riskiest part of “agentic” systems isn’t the model — it’s the connectors. This month is a practical playbook for securing tools: least privilege, prompt-injection resistance, safe side effects, and auditability that holds up under incident response.
Multi-Agent Systems Without Chaos: supervisors, specialists, and coordination contracts

Multi-Agent Systems Without Chaos: supervisors, specialists, and coordination contracts

Multi-agent setups don’t fail because “agents are dumb.” They fail because we forgot distributed-systems basics: authority, contracts, budgets, and observability. This month is a practical architecture for scaling agents without scaling chaos.
Agents as Distributed Systems: outbox, sagas, and “eventually correct” workflows

Agents as Distributed Systems: outbox, sagas, and “eventually correct” workflows

Agents don’t “run in a loop” — they run across networks, vendors, and failures. This month is about the three patterns that make agent workflows survivable: durable intent (outbox), long-running transactions (sagas), and reconciliation (“eventually correct”).
The 1M-Token Era: how long context changes retrieval economics and system design

The 1M-Token Era: how long context changes retrieval economics and system design

Long context doesn’t kill RAG — it changes what’s cheap, what’s risky, and what needs architecture. This month is a practical guide to building “context-first” systems without shipping a cost bomb (or a data leak).
Agent Runtimes Emerge: SDKs, orchestration primitives, and observability

Agent Runtimes Emerge: SDKs, orchestration primitives, and observability

In 2025, “agents” stop being demos and start being products. This is the month you realize you don’t need a smarter model — you need a runtime: durable execution, safety gates, and traces you can debug.
The Compliance Cliff: prohibited practices and governance controls that actually ship

The Compliance Cliff: prohibited practices and governance controls that actually ship

Prohibited practices aren’t a legal footnote — they’re product constraints. This month is about turning “don’t do this” into guardrails you can deploy: policy gates, capability limits, audit trails, and incident-ready governance.
Agent Evals as CI - From Prompt Tests to Scenario Harnesses and Red Teams

Agent Evals as CI - From Prompt Tests to Scenario Harnesses and Red Teams

If your agent ships without tests, it’s not an agent — it’s a production incident with good marketing. This month is about turning “it seems fine” into eval gates you can run in CI.
Computer-Use Agents in Production: sandboxes, VMs, and UI-action safety

Computer-Use Agents in Production: sandboxes, VMs, and UI-action safety

Tool use was the warm-up. Computer-use agents can click, type, and navigate real UIs — which means mistakes become side effects. This article turns “agent can drive a screen” into an architecture you can defend: isolation, action gating, verification, and auditability.
Standards for the Agent Ecosystem: connectors, protocols, and MCP

Standards for the Agent Ecosystem: connectors, protocols, and MCP

Agents are becoming the new integration surface. This is how you go from bespoke tool wiring to an ecosystem: portable connectors, common protocols, and a practical standard called MCP.
Real-Time Agents: streaming, barge-in, and session state that doesn’t collapse

Real-Time Agents: streaming, barge-in, and session state that doesn’t collapse

Real-time agents aren’t “LLMs with voice.” They’re distributed systems with hard latency budgets. This month is about streaming UX, safe interruption (barge-in), and session state you can actually operate.
Reasoning Budgets: fast/slow paths, verification, and when to “think longer”

Reasoning Budgets: fast/slow paths, verification, and when to “think longer”

“Think step by step” isn’t an architecture. In production you need budgets, routing, and verifiers so the system knows when to go fast, when to slow down, and when to refuse.
Regulation as Architecture: Turning the EU AI Act into Controls and Evidence

Regulation as Architecture: Turning the EU AI Act into Controls and Evidence

The EU AI Act isn’t a PDF you “comply with” — it’s a set of control objectives you design into your product: evaluation, documentation, monitoring, and provable safety boundaries.
Tool Use with Open Models: function calling, sandboxes, and “capability boundaries”

Tool Use with Open Models: function calling, sandboxes, and “capability boundaries”

Tool use is where LLMs stop being “text generators” and start being integration surfaces. With open weights, reliability isn’t a given — you have to engineer it with contracts, sandboxes, and explicit capability boundaries.
RAG You Can Evaluate: retrieval pipelines, reranking, citations, and truth boundaries

RAG You Can Evaluate: retrieval pipelines, reranking, citations, and truth boundaries

RAG isn’t “add a vector DB and hope.” It’s a search-and-reasoning subsystem with contracts, metrics, and failure budgets — and you can only operate what you can evaluate.
Multimodal Changes UX: designing text+vision+audio systems

Multimodal Changes UX: designing text+vision+audio systems

Multimodal isn’t “a bigger prompt”. It’s a perception + reasoning + UX system with new contracts, new failure modes, and new latency/cost constraints. This month is about designing it so it behaves predictably.
Open Weights in Production: evaluation, licensing, and guardrails

Open Weights in Production: evaluation, licensing, and guardrails

Open weights shift your risk from vendor to you. This month is the playbook: evaluate like a product, treat licensing as architecture, and ship with guardrails that survive real users.
Model Selection Becomes Architecture: Routing, Budgets, and Capability Tiers

Model Selection Becomes Architecture: Routing, Budgets, and Capability Tiers

The moment you have multiple models with different strengths, “pick the best model” turns into system design. This month is a practical blueprint for routing, fallbacks, and budget-aware capability tiers—without turning your stack into an untestable mess.
Context Assembly as a Subsystem: Summaries, State, and Token Budgets

Context Assembly as a Subsystem: Summaries, State, and Token Budgets

“Stuff vs retrieve” is only half the battle. The operable part is a context assembler: a subsystem that selects, budgets, sanitizes, and logs exactly what the model sees—so you can debug, evaluate, and scale LLM features without vibes.
Long Context Isn’t Memory: When to Stuff, When to Retrieve

Long Context Isn’t Memory: When to Stuff, When to Retrieve

Bigger context windows tempt teams to paste everything. But long context is just a larger input buffer — not memory, not grounding, and not a plan. This month: how to budget context, decide “stuff vs retrieve,” and build a context assembler that stays fast, cheap, and safe.
Prompting is Not Programming: Contracts, Schemas, and Failure Budgets

Prompting is Not Programming: Contracts, Schemas, and Failure Budgets

Prompting feels like coding until it fails like statistics. This month I start treating LLMs as probabilistic components: define contracts, enforce schemas, and design failure budgets so your system survives outputs that are “plausible” but wrong.
Midjourney and the Product Loop: Why Some Generators Feel Magical

Midjourney and the Product Loop: Why Some Generators Feel Magical

Diffusion models made image generation possible. Midjourney made it feel addictive. The difference wasn’t just the model — it was the product loop: fast iteration, visible search, and UI as a steering system for probability.
DALL·E: How Text Became Images (and Why It Changed Everything)

DALL·E: How Text Became Images (and Why It Changed Everything)

DALL·E wasn’t “just a cool generator.” It turned language into a control surface for visual distributions — and forced product teams to treat image creation like a probabilistic runtime with safety, latency, and cost constraints.
Tool Use and Agents: When the Model Becomes a Workflow Engine

Tool Use and Agents: When the Model Becomes a Workflow Engine

Tool use is not “prompting better” — it’s turning an LLM into a controlled orchestrator of deterministic systems. This month is about the architecture, safety boundaries, and eval discipline that make agents shippable.
RAG Done Right: Knowledge, Grounding, and Evaluation That Isn’t Vibes

RAG Done Right: Knowledge, Grounding, and Evaluation That Isn’t Vibes

RAG isn’t “add a vector DB.” It’s a reliability architecture: define truth boundaries, build a testable retrieval pipeline, and evaluate groundedness like you mean it.
Hallucinations: A Probabilistic Failure Mode, Not a Moral Defect

Hallucinations: A Probabilistic Failure Mode, Not a Moral Defect

Hallucinations aren’t “the model lying”. They’re what happens when a probabilistic engine is forced to answer without enough grounding. This post is about designing products that stay truthful anyway.
Dissecting ChatGPT: The Product Architecture Around the Model

Dissecting ChatGPT: The Product Architecture Around the Model

ChatGPT isn’t “an LLM”. It’s a carefully designed product loop: context assembly, policy layers, tool orchestration, and observability wrapped around a probabilistic core.
RLHF: Stabilizing Behavior with Preferences (Alignment as Control)

RLHF: Stabilizing Behavior with Preferences (Alignment as Control)

RLHF is best understood as control engineering: a learned reward signal plus a constraint that keeps the model near its pretrained competence. Here’s how it works and how it fails.
Instruction Tuning: Turning a Completion Engine into an Assistant

Instruction Tuning: Turning a Completion Engine into an Assistant

Pretraining gives you a powerful text predictor. Instruction tuning turns it into something that behaves like a helpful tool. This post explains what instruction tuning changes, what it can’t change, and how to design products around the new failure modes.
Pretraining Is Compression: Tokens, Datasets, and Emergent Skill

Pretraining Is Compression: Tokens, Datasets, and Emergent Skill

Pretraining isn’t “learning facts.” It’s learning to compress a giant slice of the internet into a predictive machine. This post gives senior engineers the mental model: tokens, data mixtures, scaling, and why capabilities seem to ‘emerge’—plus the practical implications for cost, reliability, and product design.
Transformers: Attention as an Engineering Breakthrough (Not a Math Flex)

Transformers: Attention as an Engineering Breakthrough (Not a Math Flex)

RNNs made sequence learning feel like fighting gradients. Transformers made it feel like building systems: parallelism, short gradient paths, and a memory mechanism you can scale. This post explains attention as an engineering unlock—and what it implies for real software.
Why NLP Was Hard: RNN Pain, Vanishing Gradients, and the Limits of “Memory”

Why NLP Was Hard: RNN Pain, Vanishing Gradients, and the Limits of “Memory”

Before transformers, language models tried to compress entire histories into a single hidden state. This post explains why that was brittle: depth-in-time, vanishing/exploding gradients, and the engineering limits of “memory” — and why attention was inevitable.
Software in the Age of Probabilistic Components

Software in the Age of Probabilistic Components

LLMs aren’t “features” — they’re probabilistic runtime dependencies. This post gives the mental model, contracts, failure modes, and ship-ready checklists for building real products on top of them.
Capstone: Build a System That Can Survive (Reference Architecture + Decision Log)

Capstone: Build a System That Can Survive (Reference Architecture + Decision Log)

A production system isn’t “done” when it works — it’s done when it can fail, recover, evolve, and stay correct under pressure. This capstone stitches the 2021–2022 series into a reference architecture and a decision log you can defend.
Incident Response and Resilience: Designing for Failure, Not Hope

Incident Response and Resilience: Designing for Failure, Not Hope

Most teams “have on-call”. Fewer teams have resilience. This is a practical blueprint for designing systems, teams, and workflows that respond fast, recover safely, and learn without blame.
Cost as a First-Class Constraint: FinOps for Architects

Cost as a First-Class Constraint: FinOps for Architects

Reliability is non-negotiable, but “cost” is where architecture meets physics. This month is a practical playbook: how to model cost, allocate it, and design guardrails so your system scales without surprising invoices.
Data Engineering for Product Teams: OLTP vs OLAP, Streaming, and Truth

Data Engineering for Product Teams: OLTP vs OLAP, Streaming, and Truth

Most “data problems” are actually truth problems. This month is a practical mental model for product teams: where truth lives, how it moves, when to stream, when to batch, and how to keep analytics useful without corrupting production.
Cloud Infrastructure Without the Fanaticism: IaaS, PaaS, Serverless, Kubernetes

Cloud Infrastructure Without the Fanaticism: IaaS, PaaS, Serverless, Kubernetes

A practical mental model for choosing cloud primitives without ideology—based on responsibility boundaries, scaling, reliability, cost, and team operating capacity.
Performance Engineering End-to-End: From TTFB to Tail Latency

Performance Engineering End-to-End: From TTFB to Tail Latency

Performance isn’t a tuning phase — it’s an architecture property. This month I lay out an end-to-end mental model (browser → edge → app → data) and the practical playbook for improving both “fast on average” and “fast under load” without shipping fragile optimizations.
API Evolution at Scale: Compatibility, Contracts, and Consumer-Driven Testing

API Evolution at Scale: Compatibility, Contracts, and Consumer-Driven Testing

APIs don’t fail because they’re slow — they fail because they change. This month is about designing contracts you can evolve, enforcing compatibility automatically, and scaling teams without “everyone upgrade on Tuesday.”
Distributed Data: Transactions, Outbox, Sagas, and “Eventually Correct”

Distributed Data: Transactions, Outbox, Sagas, and “Eventually Correct”

Once your system crosses a process boundary, “a transaction” stops being a feature and becomes a strategy. This post is a practical mental model for distributed data: what to keep strongly consistent, what to make eventually consistent, and how to do it safely with outbox + sagas.
Security for Builders: Threat Modeling and Secure-by-Default Systems

Security for Builders: Threat Modeling and Secure-by-Default Systems

Security isn’t a checklist you add at the end — it’s a set of architectural constraints. This month is about threat modeling that fits real teams, and defaults that prevent whole classes of incidents.
Observability that Works: Logs, Metrics, Traces, and SLO Thinking

Observability that Works: Logs, Metrics, Traces, and SLO Thinking

Observability isn’t “add dashboards.” It’s designing feedback loops you can trust: signals that answer real questions, alerts tied to user pain, and tooling that helps you debug under pressure.
CI/CD as Architecture: Testing Pyramids, Pipelines, and Rollout Safety

CI/CD as Architecture: Testing Pyramids, Pipelines, and Rollout Safety

CI/CD isn’t a DevOps checkbox — it’s the architecture that makes change safe. This month is about test economics, pipeline design, and rollout strategies that turn “deploy” into a reversible decision.
Containers, Docker, and the Discipline of Reproducibility

Containers, Docker, and the Discipline of Reproducibility

Containers aren’t “how you deploy apps” — they’re how you make environments stop being a variable. This is the operational discipline: immutable artifacts, repeatable builds, and runtime contracts you can actually rely on.
Microservices vs Modular Monolith: The “When” and the “How”

Microservices vs Modular Monolith: The “When” and the “How”

Microservices aren’t a flex — they’re a tax. Modular monoliths aren’t “temporary” — they’re often the best architecture. Here’s the decision framework, the failure modes, and the migration path that doesn’t create a distributed mess.
Queues, Retries, and Idempotency: Engineering Reality in Async Systems

Queues, Retries, and Idempotency: Engineering Reality in Async Systems

Async work is where production gets honest. This month is a practical playbook for queues, retries, idempotency keys, and the patterns that keep “background jobs” from duplicating money or burning trust.
Caching Without Folklore: Redis, CDNs, and the Two Hard Things

Caching Without Folklore: Redis, CDNs, and the Two Hard Things

Caching is not “make it faster.” It’s a contract: what can be stale, for how long, for whom, and how you recover when it lies. This month is a practical architecture guide to caching layers that scale without corrupting truth.
Data Stores 101 for Architects: SQL, NoSQL, and the Shape of Consistency

Data Stores 101 for Architects: SQL, NoSQL, and the Shape of Consistency

Stop choosing databases by brand. Choose them by invariants, access patterns, and what “correct” means when the network is on fire.
RESTful Design That Survives: Resources, Boundaries, and Versioning

RESTful Design That Survives: Resources, Boundaries, and Versioning

REST isn’t “JSON over HTTP.” It’s a set of constraints that make interfaces boring, predictable, and resilient under change. This month is about designing resource boundaries and contracts that survive growth — and using versioning only when you’ve earned it.
Backends: Frameworks Don’t Matter Until They Do (Node, Java, .NET, Go, Python)

Backends: Frameworks Don’t Matter Until They Do (Node, Java, .NET, Go, Python)

Early on, any backend “works.” Then timeouts, GC pauses, cold starts, and operability show up. This is a practical mental model for choosing runtimes and frameworks based on constraints—not vibes.
Frontend Systems: Routing, State, Forms, and the “Boring Stack” That Scales

Frontend Systems: Routing, State, Forms, and the “Boring Stack” That Scales

The month I stop treating “frontend architecture” as component trivia and start treating it as a system: URLs as contracts, state as truth, forms as transactions, and a boring stack you can operate.
React as an Architecture Tool: Components, Hooks, and the Cost of Re-rendering

React as an Architecture Tool: Components, Hooks, and the Cost of Re-rendering

React is not a UI library — it’s a runtime for composing change. This post teaches the mental model that makes large React codebases predictable: component boundaries, hooks as architecture, and the real cost model of re-rendering (so you optimize the right thing).
AJAX → Fetch → GraphQL → tRPC: Choosing Your Data Boundary

AJAX → Fetch → GraphQL → tRPC: Choosing Your Data Boundary

Your data boundary is not “how we call the API.” It’s a coupling contract between teams, runtimes, and failure modes. This post gives you a practical decision framework for REST, GraphQL, and RPC-style boundaries (including tRPC) — with the tradeoffs that show up in production.
Browser Reality: The Event Loop, Rendering, and Why UX Bugs Look Like Backend Bugs

Browser Reality: The Event Loop, Rendering, and Why UX Bugs Look Like Backend Bugs

The browser is a constrained runtime with a scheduling problem: one main thread, many responsibilities, and users who notice missed frames. This post gives you the mental model to debug “random” UX failures as deterministic timing and contention issues.
HTTP as a Distributed Systems API (Without the Buzzwords)

HTTP as a Distributed Systems API (Without the Buzzwords)

HTTP isn’t just “how browsers talk to servers.” It’s a mature distributed-systems contract with semantics for caching, retries, concurrency, intermediaries, and evolution. If you design APIs without those semantics, production will teach you them anyway.
The Web's "Compression Algorithm": Static → Web 2.0 → SPA → SSR/Edge

The Web's "Compression Algorithm": Static → Web 2.0 → SPA → SSR/Edge

The web didn’t evolve because developers got bored. It evolved because latency, state, and economics kept forcing us to move responsibility between server, client, and edge. This post gives you the mental model and the checklist to choose the right rendering architecture in 2021+
From Research Rig to System: 2020 Postmortem and the Real Amazing Result

From Research Rig to System: 2020 Postmortem and the Real Amazing Result

2020 is when I stopped training agents and started building a trading system: environments, evaluation discipline, safety, and a live loop that survives outages. This is the postmortem — and the first result that actually held up in reality.
Batch Training & Evaluation Again: Promising Results That Survive Scrutiny

Batch Training & Evaluation Again: Promising Results That Survive Scrutiny

In late 2020 I stopped trusting hero backtests. I built a batch runner + a walk-forward evaluation harness, added eval gates, and discovered an uncomfortable truth: shorter training often wins.
bitmex-management-gym: Position Sizing and the First Risk-Aware Agent

bitmex-management-gym: Position Sizing and the First Risk-Aware Agent

After months of "all-in" agents with bull personalities, I rebuilt the environment to teach risk: stackable positions, time-awareness, and penalties that prevent reward-hacking.
Maker Trades as a Strategy: When Fees Become a Reward Signal

Maker Trades as a Strategy: When Fees Become a Reward Signal

In September 2020, I stop trying to be fast and start trying to be executable. The surprising result: in maker-style trading, fees aren’t a footnote — they can be the whole edge.
Deep Silos in RL: Architecture as Stability (and the First LSTM Variant)

Deep Silos in RL: Architecture as Stability (and the First LSTM Variant)

August 2020 - After the first live pain and the bull-personality problem, I stopped tuning "algorithms" and started tuning the network contract. Deep Silos beat flat MLPs, and the LSTM variant overfit fast.
Constraints That Teach: Risk Caps, Timeouts, and Surviving Bad Regimes

Constraints That Teach: Risk Caps, Timeouts, and Surviving Bad Regimes

After my first disappointing live runs, I stopped asking my agent to be clever and started forcing it to be safe: risk caps, timeouts, and “market-health” gates that kept the loop alive when the regime wasn’t.
First Live Runs - Small Size, Big Lessons

First Live Runs - Small Size, Big Lessons

Backtests looked amazing. Live PnL didn't. In June 2020 I ran the first real BitMEX live loop at tiny size and learned the most important lesson in trading ML: regime is the boss.
Safety Engineering - Kill Switches, Reconciliation, and Failure Recovery

Safety Engineering - Kill Switches, Reconciliation, and Failure Recovery

In May 2020, I stop hoping the bot is “fine” and start giving it explicit failure states — stale websockets, missing fills, rate-limits, and the kill switches that keep a live loop honest.
Chappie Wiring From Trained Policy to Running Process

Chappie Wiring From Trained Policy to Running Process

The moment RL stops being a notebook artifact: load a PPO policy, rebuild the live observation stream, and turn BitMEX into a runtime you can monitor and control.
Reward Shaping Without Lying - Penalties, Constraints, and the First Real Fixes

Reward Shaping Without Lying - Penalties, Constraints, and the First Real Fixes

In March 2020, I stopped treating reward like a number and started treating it like a contract—pay real costs, punish real risk, and don’t teach the agent to win a video game.
Evaluation Discipline - Walk-Forward Backtesting Inside the Gym

Evaluation Discipline - Walk-Forward Backtesting Inside the Gym

Training reward was lying to me. So I turned evaluation into a first-class system - chronological splits, deterministic runs, and walk-forward backtests that survive the next dataset.
bitmex-gym - The Baseline Trading Environment (Where Cheating Starts)

bitmex-gym - The Baseline Trading Environment (Where Cheating Starts)

In January 2020 I stop “predicting” and build a Gym environment that turns BitMEX microstructure features into actions, fills, and rewards — and makes every hidden assumption painfully visible.
From Prediction to Decision - Designing the Trading Environment Contract

From Prediction to Decision - Designing the Trading Environment Contract

I stopped pretending “a good predictor” was the same thing as “a tradable strategy” and designed a Gym-style environment contract that makes cheating obvious and failure modes measurable.
The 503 Lesson - Outages as a Signal, Not Just a Bug

The 503 Lesson - Outages as a Signal, Not Just a Bug

My first live alpha monitor was “working”… until BitMEX started replying 503 right when the model got excited. That’s when I learned availability is part of market microstructure.
Live Alpha Monitoring - When the Market Talks Back

Live Alpha Monitoring - When the Market Talks Back

I stop treating my alpha model like a notebook artifact and make it sit in the real BitMEX stream. The goal is not trading yet. It is seeing whether my features, normalization, and inference loop survive reality without quietly cheating.
Deep Silos - Representation Learning That Respects Feature Families

Deep Silos - Representation Learning That Respects Feature Families

My first serious attempt at making the model “see” microstructure features the way I designed them — grouped, compressed, and only then fused.
Supervised Baselines - First Alpha Models, First Humbling Curves

Supervised Baselines - First Alpha Models, First Humbling Curves

I train my first alpha predictor on BitMEX order-book features, and learn why ‘it trains’ is not the same as ‘it works’.
Defining Alpha Without Cheating - Look-Ahead Labels and Leakage Traps

Defining Alpha Without Cheating - Look-Ahead Labels and Leakage Traps

Before I train anything, I need a label that doesn’t smuggle the future into my dataset.
Normalization Is a Deployment Problem - Mean/Sigma and Index Diff

Normalization Is a Deployment Problem - Mean/Sigma and Index Diff

In June 2019 I stop treating feature scaling as “preprocessing” and start treating it as part of the production contract - same transforms, same stats, same order — or the live system lies.
Feature Engineering, But Make It Microstructure: Liquidity Created/Removed

Feature Engineering, But Make It Microstructure: Liquidity Created/Removed

If the order book is the battlefield, features are the sensors. This month I stop hand-waving and teach my pipeline to measure liquidity being added and removed - in a way I can deploy live.
Dataset Reality — HDF5 Schema, Missing Data, and “Don’t Lie to Yourself” Rules

Dataset Reality — HDF5 Schema, Missing Data, and “Don’t Lie to Yourself” Rules

In April 2019 I learned that the hardest part of trading ML isn’t the model — it’s the dataset contract. This month is about HDF5, integrity checks, and building rules that stop “good backtests” from lying.
The Collector - Websockets, Clock Drift, and the First Clean Snapshot

The Collector - Websockets, Clock Drift, and the First Clean Snapshot

In March 2019 I stop “talking about microstructure” and start collecting it. Websockets drop messages, clocks drift, and the only thing that matters is producing a snapshot I can trust.
From Microstructure to Features - What the Model Will See

From Microstructure to Features - What the Model Will See

If RL taught me “the state is the contract,” then trading is where that contract becomes painful. This month I map order book microstructure into concrete feature families my models can actually learn from.
Order Books Are the Battlefield - Matching Engines in Plain English

Order Books Are the Battlefield - Matching Engines in Plain English

In 2018 I learned RL inside clean Gym worlds. In 2019 I’m pointing that mindset at BitMEX — where the “environment” is a matching engine and the rewards come with slippage, fees, queue priority, and outages.
Imitation Learning - GAIL and the Strange Feeling of Learning From Experts

Imitation Learning - GAIL and the Strange Feeling of Learning From Experts

I ended 2018 in a weird place - less “reinforcement,” more “copying.” GAIL taught me that sometimes the fastest path to competence is to borrow behavior first — and ask questions later.
Sparse Rewards - HER and Learning From What Didn’t Happen

Sparse Rewards - HER and Learning From What Didn’t Happen

This month RL didn’t fail loudly. It failed quietly. Sparse rewards taught me the most brutal lesson yet - if nothing “good” happens, nothing gets learned — unless you rewrite what counts as experience.
Stability is a Feature You Have to Design

Stability is a Feature You Have to Design

After DDPG, I stopped thinking of RL instability as a surprise and started treating it like a design constraint. This month I learned why TRPO exists — and why PPO/PPO2 became the practical answer.
Continuous Control - DDPG and the Seduction of Off-Policy

Continuous Control - DDPG and the Seduction of Off-Policy

This month I left “toy” discrete actions and stepped into continuous control. DDPG looked like the perfect deal—until I learned what off-policy really costs.
Why RL Training Is Unstable (A Catalog of Breakage)

Why RL Training Is Unstable (A Catalog of Breakage)

After actor-critic finally felt “trainable,” I hit the next wall - RL doesn’t just fail—it fails in loops. This month is my map of the most common ways it breaks.
Actor-Critic - The First Time RL Feels Trainable

Actor-Critic - The First Time RL Feels Trainable

Policy gradients were honest but noisy. This month I added a critic—and for the first time in RL, training started to feel like something I could actually steer.
Policy Gradients - Learning Without a Value Crutch

Policy Gradients - Learning Without a Value Crutch

DQN taught me how fragile value learning can be. This month I tried something different - learn the policy directly. No Q-table. No value “crutch.” Just behavior, gradients, and a whole new set of failure modes.
Deep Q-Learning - My First Real Baselines Month

Deep Q-Learning - My First Real Baselines Month

This is the month I stopped reading about deep RL and started running it. DQN is simple enough to explain, hard enough to break, and perfect for learning Baselines like an engineer.
Function Approximation - The Day RL Stopped Being Stable

Function Approximation - The Day RL Stopped Being Stable

Tabular RL felt clean because you could see the truth in a table. The moment I replaced the table with a model, RL stopped being a neat algorithm and became a fragile system.
Tabular RL - When Value Iteration Feels Like Cheating

Tabular RL - When Value Iteration Feels Like Cheating

Tabular RL is the last time reinforcement learning feels clean. Values become literal tables, planning becomes explicit, and the “aha” moments arrive fast.
Bandits - The First Honest RL Problem

Bandits - The First Honest RL Problem

Bandits strip RL down to one tension—explore vs exploit—so I can stop confusing luck with learning and start building real intuition.
Rewards, Returns, and Why “Learning” Is an Interface Problem

Rewards, Returns, and Why “Learning” Is an Interface Problem

I’m starting 2018 by shifting from deep learning to reinforcement learning. The first lesson isn’t an algorithm — it’s that the data pipeline is the policy itself.
From Classical ML to Deep Learning - What Actually Changed (and What Didn’t) (and My Next Steps)

From Classical ML to Deep Learning - What Actually Changed (and What Didn’t) (and My Next Steps)

A year after finishing Andrew Ng’s classical ML course, I’m trying to separate enduring principles from deep learning-specific techniques—and decide where to go next.
LSTMs - Engineering Memory into the Network

LSTMs - Engineering Memory into the Network

After vanilla RNNs taught me why gradients collapse through time, LSTMs finally felt like an engineered solution - keep the memory path stable, and control it with gates.
Vanishing Gradients Strike Back - The Pain of Training RNNs

Vanishing Gradients Strike Back - The Pain of Training RNNs

RNNs looked elegant on paper. Training them exposed the same old enemy—vanishing/exploding gradients—just with “depth in time”.
Why Sequences Break Everything - Enter Recurrent Neural Networks

Why Sequences Break Everything - Enter Recurrent Neural Networks

Images were hard, but at least they were static. Sequences add “time”, shared weights, and state — and suddenly the assumptions I relied on in 2016 stop holding.
Pooling, Hierarchies, and What CNNs Are Really Learning

Pooling, Hierarchies, and What CNNs Are Really Learning

Convolution made CNNs "possible". Pooling and depth made them "useful" - invariance, hierarchies, and feature maps that start to look like learned vision primitives.
Convolutions - Why CNNs See the World Differently

Convolutions - Why CNNs See the World Differently

After months wrestling with training stability, I finally hit the next wall - fully-connected nets don’t “get” images. Convolutions felt like the first time architecture itself encoded domain knowledge.
Optimization Got Real - Momentum, Learning Rates, and Why Plain Gradient Descent Wasn’t Enough

Optimization Got Real - Momentum, Learning Rates, and Why Plain Gradient Descent Wasn’t Enough

In 2016, gradient descent felt like “the algorithm.” In deep learning, it’s just the beginning. This month I learned why momentum and careful learning rates are what make training "actually move".
Initialization, Scale, and the Fragility of Deep Networks

Initialization, Scale, and the Fragility of Deep Networks

After learning why gradients vanish, I discovered something even more unsettling - deep networks can fail before training even begins, simply because the starting scale is wrong.
Activation Functions Are Not a Detail - ReLU Changed Everything

Activation Functions Are Not a Detail - ReLU Changed Everything

April 2017 — I used to treat activation functions like a minor math choice. Then I saw how one change (ReLU) could decide whether a deep network learns at all.
Why Deeper Networks Are Harder to Train Than I Expected

Why Deeper Networks Are Harder to Train Than I Expected

I assumed “more layers” would just mean “more power.” Instead I discovered that depth introduces a new failure mode - gradients can disappear (or explode) long before the model learns anything useful.
Backpropagation Demystified - It’s Just the Chain Rule (But Applied Ruthlessly)

Backpropagation Demystified - It’s Just the Chain Rule (But Applied Ruthlessly)

Backprop stopped feeling like magic when I treated it like engineering - track shapes, follow the chain rule, and test gradients like you’d test any critical system.
From Logistic Regression to Neurons - Rebuilding Intuition from the Perceptron

From Logistic Regression to Neurons - Rebuilding Intuition from the Perceptron

After finishing Andrew Ng’s Machine Learning course, I start my deep learning journey by revisiting the perceptron and realizing neural networks begin with ideas I already understand.
Exercise 8 + Course Wrap - Anomaly Detection & Recommenders (and My Next Steps)

Exercise 8 + Course Wrap - Anomaly Detection & Recommenders (and My Next Steps)

I wrapped Andrew Ng’s ML course by building anomaly detection and a simple movie recommender—two patterns that show up everywhere in real systems.
Exercise 7 - Unsupervised Learning (K-means) + PCA (Compression & Visualization)

Exercise 7 - Unsupervised Learning (K-means) + PCA (Compression & Visualization)

Implement K-means clustering, compress an image to 16 colors, then use PCA to reduce dimensions and build eigenfaces.
Exercise 6 - Support Vector Machines (When a Different Model Just Wins)

Exercise 6 - Support Vector Machines (When a Different Model Just Wins)

I built my first spam classifier with SVMs—learning how C and sigma shape decision boundaries, and why linear SVMs scale surprisingly well.
Exercise 5 - Debugging ML (Bias/Variance, Learning Curves, and What to Try Next)

Exercise 5 - Debugging ML (Bias/Variance, Learning Curves, and What to Try Next)

The most practical assignment so far - diagnose bias vs variance using learning curves, tune lambda with validation curves, and build a repeatable “next action” playbook.
Exercise 4 - Neural Networks Learning (Backpropagation Without Tears)

Exercise 4 - Neural Networks Learning (Backpropagation Without Tears)

Implement backpropagation for a 2-layer neural network, verify gradients numerically, train on handwritten digits, and hit ~95% accuracy.
Exercise 3 - One-vs-All + Intro to Neural Networks (Handwritten Digits!)

Exercise 3 - One-vs-All + Intro to Neural Networks (Handwritten Digits!)

Build a multi-class digit classifier with one-vs-all logistic regression, then run a small neural network forward pass on the same dataset.
Regularization - Overfitting in the Real World (and How to Fight It)

Regularization - Overfitting in the Real World (and How to Fight It)

Overfitting is what happens when your model memorizes training data. Regularization is the practical tool that keeps it honest.
Exercise 2 - Logistic Regression for Classification (My First Real Classifier)

Exercise 2 - Logistic Regression for Classification (My First Real Classifier)

My first real classifier - predict admissions from exam scores with logistic regression, then learn why regularization matters on a non-linear dataset.
Normal Equation vs Gradient Descent (Choosing Tools Like an Engineer)

Normal Equation vs Gradient Descent (Choosing Tools Like an Engineer)

Notes — two ways to fit linear regression - iterative gradient descent vs one-shot normal equation. Same goal, different tradeoffs.
Linear Regression With Multiple Variables (and Why Vectorization Matters)

Linear Regression With Multiple Variables (and Why Vectorization Matters)

Notes from Andrew Ng’s ML course — extend linear regression to multiple features, learn feature scaling/mean normalization, and stop writing slow loops by vectorizing everything.
Exercise 1 - Linear Regression From Scratch

Exercise 1 - Linear Regression From Scratch

Notes from Andrew Ng’s ML course — plot the food-truck dataset, implement computeCost + gradientDescent in Octave, and build intuition with J(theta) visualizations.
Why I’m Learning Machine Learning

Why I’m Learning Machine Learning

In 2016, I’m documenting my journey through Andrew Ng’s Machine Learning course—building intuition, writing Octave code, and learning how to think in data.
Axel Domingues - 2026