Writing & Research

Short essays and long-form series on operable AI agents, evaluation, governance, and production reliability.

Feb 22, 2026

OpenClaw: A Viral Agent, a Skills Ecosystem, and the Supply-Chain Reality Check

OpenClaw went from “personal AI assistant” to “security headline” in weeks. This post explains what OpenClaw actually is (agent runtime + gateway + skills), why skills are a software supply chain, and the architecture patterns that make a self-hosted agent survivable.

Jan 25, 2026

Prism and the Architecture of Artifact-Native AI for Science

Prism is a LaTeX-native, AI-first workspace for scientific writing. Under the hood, it’s a new pattern: the model is embedded in the artifact, not bolted onto the side. This post explains the architecture, the contracts that keep it honest, and what “artifact-native AI” changes for reliability and governance.

Dec 28, 2025

Reference Architecture v2: the Operable Agent Platform

This is the 2025 finale: a practical reference architecture for running fleets of agents with governance—connectors you can trust, traces you can debug, evals you can ship, and humans you can hand off to.

Nov 30, 2025

The Connector Ecosystem: MCP adoption patterns, versioning, and governance

Once agents can call tools, connectors become the new platform surface. This month is a playbook for adopting MCP at scale: patterns that work, versioning that doesn’t break customers, and governance that keeps the ecosystem sane.

Oct 26, 2025

Voice Agents You Can Operate: reliability, caching, latency, and human handoff

Voice turns LLMs into real-time systems. This month is about building voice agents that meet latency budgets, degrade safely, and hand off to humans without losing context—or trust.

Sep 28, 2025

GPAI Obligations Begin: What Changes for Model Providers and Enterprises

The EU AI Act turns “model choice” into a regulated interface. This month is a practical playbook: what GPAI providers must ship, what enterprises must demand, and how to build compliance into your agent platform without slowing delivery.

Aug 31, 2025

Security for Agent Connectors: least privilege, injection resistance, and safe toolchains

In 2025, the riskiest part of “agentic” systems isn’t the model — it’s the connectors. This month is a practical playbook for securing tools: least privilege, prompt-injection resistance, safe side effects, and auditability that holds up under incident response.

Jul 27, 2025

Multi-Agent Systems Without Chaos: supervisors, specialists, and coordination contracts

Multi-agent setups don’t fail because “agents are dumb.” They fail because we forgot distributed-systems basics: authority, contracts, budgets, and observability. This month is a practical architecture for scaling agents without scaling chaos.

Jun 29, 2025

Agents as Distributed Systems: outbox, sagas, and “eventually correct” workflows

Agents don’t “run in a loop” — they run across networks, vendors, and failures. This month is about the three patterns that make agent workflows survivable: durable intent (outbox), long-running transactions (sagas), and reconciliation (“eventually correct”).

May 25, 2025

The 1M-Token Era: how long context changes retrieval economics and system design

Long context doesn’t kill RAG — it changes what’s cheap, what’s risky, and what needs architecture. This month is a practical guide to building “context-first” systems without shipping a cost bomb (or a data leak).

Apr 27, 2025

Agent Runtimes Emerge: SDKs, orchestration primitives, and observability

In 2025, “agents” stop being demos and start being products. This is the month you realize you don’t need a smarter model — you need a runtime: durable execution, safety gates, and traces you can debug.

Mar 30, 2025

The Compliance Cliff: prohibited practices and governance controls that actually ship

Prohibited practices aren’t a legal footnote — they’re product constraints. This month is about turning “don’t do this” into guardrails you can deploy: policy gates, capability limits, audit trails, and incident-ready governance.

Feb 23, 2025

Agent Evals as CI - From Prompt Tests to Scenario Harnesses and Red Teams

If your agent ships without tests, it’s not an agent — it’s a production incident with good marketing. This month is about turning “it seems fine” into eval gates you can run in CI.

Jan 26, 2025

Computer-Use Agents in Production: sandboxes, VMs, and UI-action safety

Tool use was the warm-up. Computer-use agents can click, type, and navigate real UIs — which means mistakes become side effects. This article turns “agent can drive a screen” into an architecture you can defend: isolation, action gating, verification, and auditability.

Dec 29, 2024

Standards for the Agent Ecosystem: connectors, protocols, and MCP

Agents are becoming the new integration surface. This is how you go from bespoke tool wiring to an ecosystem: portable connectors, common protocols, and a practical standard called MCP.

Nov 24, 2024

Real-Time Agents: streaming, barge-in, and session state that doesn’t collapse

Real-time agents aren’t “LLMs with voice.” They’re distributed systems with hard latency budgets. This month is about streaming UX, safe interruption (barge-in), and session state you can actually operate.

Oct 27, 2024

Reasoning Budgets: fast/slow paths, verification, and when to “think longer”

“Think step by step” isn’t an architecture. In production you need budgets, routing, and verifiers so the system knows when to go fast, when to slow down, and when to refuse.

Sep 29, 2024

Regulation as Architecture: Turning the EU AI Act into Controls and Evidence

The EU AI Act isn’t a PDF you “comply with” — it’s a set of control objectives you design into your product: evaluation, documentation, monitoring, and provable safety boundaries.

Aug 25, 2024

Tool Use with Open Models: function calling, sandboxes, and “capability boundaries”

Tool use is where LLMs stop being “text generators” and start being integration surfaces. With open weights, reliability isn’t a given — you have to engineer it with contracts, sandboxes, and explicit capability boundaries.

Jul 28, 2024

RAG You Can Evaluate: retrieval pipelines, reranking, citations, and truth boundaries

RAG isn’t “add a vector DB and hope.” It’s a search-and-reasoning subsystem with contracts, metrics, and failure budgets — and you can only operate what you can evaluate.

Jun 30, 2024

Multimodal Changes UX: designing text+vision+audio systems

Multimodal isn’t “a bigger prompt”. It’s a perception + reasoning + UX system with new contracts, new failure modes, and new latency/cost constraints. This month is about designing it so it behaves predictably.

May 26, 2024

Open Weights in Production: evaluation, licensing, and guardrails

Open weights shift your risk from vendor to you. This month is the playbook: evaluate like a product, treat licensing as architecture, and ship with guardrails that survive real users.

Apr 28, 2024

Model Selection Becomes Architecture: Routing, Budgets, and Capability Tiers

The moment you have multiple models with different strengths, “pick the best model” turns into system design. This month is a practical blueprint for routing, fallbacks, and budget-aware capability tiers—without turning your stack into an untestable mess.

Mar 31, 2024

Context Assembly as a Subsystem: Summaries, State, and Token Budgets

“Stuff vs retrieve” is only half the battle. The operable part is a context assembler: a subsystem that selects, budgets, sanitizes, and logs exactly what the model sees—so you can debug, evaluate, and scale LLM features without vibes.

Feb 25, 2024

Long Context Isn’t Memory: When to Stuff, When to Retrieve

Bigger context windows tempt teams to paste everything. But long context is just a larger input buffer — not memory, not grounding, and not a plan. This month: how to budget context, decide “stuff vs retrieve,” and build a context assembler that stays fast, cheap, and safe.

Jan 28, 2024

Prompting is Not Programming: Contracts, Schemas, and Failure Budgets

Prompting feels like coding until it fails like statistics. This month I start treating LLMs as probabilistic components: define contracts, enforce schemas, and design failure budgets so your system survives outputs that are “plausible” but wrong.

Dec 31, 2023

Midjourney and the Product Loop: Why Some Generators Feel Magical

Diffusion models made image generation possible. Midjourney made it feel addictive. The difference wasn’t just the model — it was the product loop: fast iteration, visible search, and UI as a steering system for probability.

Nov 26, 2023

DALL·E: How Text Became Images (and Why It Changed Everything)

DALL·E wasn’t “just a cool generator.” It turned language into a control surface for visual distributions — and forced product teams to treat image creation like a probabilistic runtime with safety, latency, and cost constraints.

Oct 29, 2023

Tool Use and Agents: When the Model Becomes a Workflow Engine

Tool use is not “prompting better” — it’s turning an LLM into a controlled orchestrator of deterministic systems. This month is about the architecture, safety boundaries, and eval discipline that make agents shippable.

Sep 24, 2023

RAG Done Right: Knowledge, Grounding, and Evaluation That Isn’t Vibes

RAG isn’t “add a vector DB.” It’s a reliability architecture: define truth boundaries, build a testable retrieval pipeline, and evaluate groundedness like you mean it.

Aug 27, 2023

Hallucinations: A Probabilistic Failure Mode, Not a Moral Defect

Hallucinations aren’t “the model lying”. They’re what happens when a probabilistic engine is forced to answer without enough grounding. This post is about designing products that stay truthful anyway.

Jul 30, 2023

Dissecting ChatGPT: The Product Architecture Around the Model

ChatGPT isn’t “an LLM”. It’s a carefully designed product loop: context assembly, policy layers, tool orchestration, and observability wrapped around a probabilistic core.

Jun 25, 2023

RLHF: Stabilizing Behavior with Preferences (Alignment as Control)

RLHF is best understood as control engineering: a learned reward signal plus a constraint that keeps the model near its pretrained competence. Here’s how it works and how it fails.

May 28, 2023

Instruction Tuning: Turning a Completion Engine into an Assistant

Pretraining gives you a powerful text predictor. Instruction tuning turns it into something that behaves like a helpful tool. This post explains what instruction tuning changes, what it can’t change, and how to design products around the new failure modes.

Apr 30, 2023

Pretraining Is Compression: Tokens, Datasets, and Emergent Skill

Pretraining isn’t “learning facts.” It’s learning to compress a giant slice of the internet into a predictive machine. This post gives senior engineers the mental model: tokens, data mixtures, scaling, and why capabilities seem to ‘emerge’—plus the practical implications for cost, reliability, and product design.

Mar 26, 2023

Transformers: Attention as an Engineering Breakthrough (Not a Math Flex)

RNNs made sequence learning feel like fighting gradients. Transformers made it feel like building systems: parallelism, short gradient paths, and a memory mechanism you can scale. This post explains attention as an engineering unlock—and what it implies for real software.

Feb 26, 2023

Why NLP Was Hard: RNN Pain, Vanishing Gradients, and the Limits of “Memory”

Before transformers, language models tried to compress entire histories into a single hidden state. This post explains why that was brittle: depth-in-time, vanishing/exploding gradients, and the engineering limits of “memory” — and why attention was inevitable.

Jan 29, 2023

Software in the Age of Probabilistic Components

LLMs aren’t “features” — they’re probabilistic runtime dependencies. This post gives the mental model, contracts, failure modes, and ship-ready checklists for building real products on top of them.

Dec 25, 2022

Capstone: Build a System That Can Survive (Reference Architecture + Decision Log)

A production system isn’t “done” when it works — it’s done when it can fail, recover, evolve, and stay correct under pressure. This capstone stitches the 2021–2022 series into a reference architecture and a decision log you can defend.

Nov 27, 2022

Incident Response and Resilience: Designing for Failure, Not Hope

Most teams “have on-call”. Fewer teams have resilience. This is a practical blueprint for designing systems, teams, and workflows that respond fast, recover safely, and learn without blame.

Oct 30, 2022

Cost as a First-Class Constraint: FinOps for Architects

Reliability is non-negotiable, but “cost” is where architecture meets physics. This month is a practical playbook: how to model cost, allocate it, and design guardrails so your system scales without surprising invoices.

Sep 25, 2022

Data Engineering for Product Teams: OLTP vs OLAP, Streaming, and Truth

Most “data problems” are actually truth problems. This month is a practical mental model for product teams: where truth lives, how it moves, when to stream, when to batch, and how to keep analytics useful without corrupting production.

Aug 28, 2022

Cloud Infrastructure Without the Fanaticism: IaaS, PaaS, Serverless, Kubernetes

A practical mental model for choosing cloud primitives without ideology—based on responsibility boundaries, scaling, reliability, cost, and team operating capacity.

Jul 31, 2022

Performance Engineering End-to-End: From TTFB to Tail Latency

Performance isn’t a tuning phase — it’s an architecture property. This month I lay out an end-to-end mental model (browser → edge → app → data) and the practical playbook for improving both “fast on average” and “fast under load” without shipping fragile optimizations.

Jun 26, 2022

API Evolution at Scale: Compatibility, Contracts, and Consumer-Driven Testing

APIs don’t fail because they’re slow — they fail because they change. This month is about designing contracts you can evolve, enforcing compatibility automatically, and scaling teams without “everyone upgrade on Tuesday.”

May 29, 2022

Distributed Data: Transactions, Outbox, Sagas, and “Eventually Correct”

Once your system crosses a process boundary, “a transaction” stops being a feature and becomes a strategy. This post is a practical mental model for distributed data: what to keep strongly consistent, what to make eventually consistent, and how to do it safely with outbox + sagas.

Apr 24, 2022

Security for Builders: Threat Modeling and Secure-by-Default Systems

Security isn’t a checklist you add at the end — it’s a set of architectural constraints. This month is about threat modeling that fits real teams, and defaults that prevent whole classes of incidents.

Mar 27, 2022

Observability that Works: Logs, Metrics, Traces, and SLO Thinking

Observability isn’t “add dashboards.” It’s designing feedback loops you can trust: signals that answer real questions, alerts tied to user pain, and tooling that helps you debug under pressure.

Feb 27, 2022

CI/CD as Architecture: Testing Pyramids, Pipelines, and Rollout Safety

CI/CD isn’t a DevOps checkbox — it’s the architecture that makes change safe. This month is about test economics, pipeline design, and rollout strategies that turn “deploy” into a reversible decision.

Jan 30, 2022

Containers, Docker, and the Discipline of Reproducibility

Containers aren’t “how you deploy apps” — they’re how you make environments stop being a variable. This is the operational discipline: immutable artifacts, repeatable builds, and runtime contracts you can actually rely on.

Dec 26, 2021

Microservices vs Modular Monolith: The “When” and the “How”

Microservices aren’t a flex — they’re a tax. Modular monoliths aren’t “temporary” — they’re often the best architecture. Here’s the decision framework, the failure modes, and the migration path that doesn’t create a distributed mess.

Nov 28, 2021

Queues, Retries, and Idempotency: Engineering Reality in Async Systems

Async work is where production gets honest. This month is a practical playbook for queues, retries, idempotency keys, and the patterns that keep “background jobs” from duplicating money or burning trust.

Oct 31, 2021

Caching Without Folklore: Redis, CDNs, and the Two Hard Things

Caching is not “make it faster.” It’s a contract: what can be stale, for how long, for whom, and how you recover when it lies. This month is a practical architecture guide to caching layers that scale without corrupting truth.

Sep 26, 2021

Data Stores 101 for Architects: SQL, NoSQL, and the Shape of Consistency

Stop choosing databases by brand. Choose them by invariants, access patterns, and what “correct” means when the network is on fire.

Aug 29, 2021

RESTful Design That Survives: Resources, Boundaries, and Versioning

REST isn’t “JSON over HTTP.” It’s a set of constraints that make interfaces boring, predictable, and resilient under change. This month is about designing resource boundaries and contracts that survive growth — and using versioning only when you’ve earned it.

Jul 25, 2021

Backends: Frameworks Don’t Matter Until They Do (Node, Java, .NET, Go, Python)

Early on, any backend “works.” Then timeouts, GC pauses, cold starts, and operability show up. This is a practical mental model for choosing runtimes and frameworks based on constraints—not vibes.

Jun 27, 2021

Frontend Systems: Routing, State, Forms, and the “Boring Stack” That Scales

The month I stop treating “frontend architecture” as component trivia and start treating it as a system: URLs as contracts, state as truth, forms as transactions, and a boring stack you can operate.

May 30, 2021

React as an Architecture Tool: Components, Hooks, and the Cost of Re-rendering

React is not a UI library — it’s a runtime for composing change. This post teaches the mental model that makes large React codebases predictable: component boundaries, hooks as architecture, and the real cost model of re-rendering (so you optimize the right thing).

Apr 25, 2021

AJAX → Fetch → GraphQL → tRPC: Choosing Your Data Boundary

Your data boundary is not “how we call the API.” It’s a coupling contract between teams, runtimes, and failure modes. This post gives you a practical decision framework for REST, GraphQL, and RPC-style boundaries (including tRPC) — with the tradeoffs that show up in production.

Mar 28, 2021

Browser Reality: The Event Loop, Rendering, and Why UX Bugs Look Like Backend Bugs

The browser is a constrained runtime with a scheduling problem: one main thread, many responsibilities, and users who notice missed frames. This post gives you the mental model to debug “random” UX failures as deterministic timing and contention issues.

Feb 28, 2021

HTTP as a Distributed Systems API (Without the Buzzwords)

HTTP isn’t just “how browsers talk to servers.” It’s a mature distributed-systems contract with semantics for caching, retries, concurrency, intermediaries, and evolution. If you design APIs without those semantics, production will teach you them anyway.

Jan 31, 2021

The Web's "Compression Algorithm": Static → Web 2.0 → SPA → SSR/Edge

The web didn’t evolve because developers got bored. It evolved because latency, state, and economics kept forcing us to move responsibility between server, client, and edge. This post gives you the mental model and the checklist to choose the right rendering architecture in 2021+

Dec 27, 2020

From Research Rig to System: 2020 Postmortem and the Real Amazing Result

2020 is when I stopped training agents and started building a trading system: environments, evaluation discipline, safety, and a live loop that survives outages. This is the postmortem — and the first result that actually held up in reality.

Nov 29, 2020

Batch Training & Evaluation Again: Promising Results That Survive Scrutiny

In late 2020 I stopped trusting hero backtests. I built a batch runner + a walk-forward evaluation harness, added eval gates, and discovered an uncomfortable truth: shorter training often wins.

Oct 25, 2020

bitmex-management-gym: Position Sizing and the First Risk-Aware Agent

After months of "all-in" agents with bull personalities, I rebuilt the environment to teach risk: stackable positions, time-awareness, and penalties that prevent reward-hacking.

Sep 27, 2020

Maker Trades as a Strategy: When Fees Become a Reward Signal

In September 2020, I stop trying to be fast and start trying to be executable. The surprising result: in maker-style trading, fees aren’t a footnote — they can be the whole edge.

Aug 30, 2020

Deep Silos in RL: Architecture as Stability (and the First LSTM Variant)

August 2020 - After the first live pain and the bull-personality problem, I stopped tuning "algorithms" and started tuning the network contract. Deep Silos beat flat MLPs, and the LSTM variant overfit fast.

Jul 26, 2020

Constraints That Teach: Risk Caps, Timeouts, and Surviving Bad Regimes

After my first disappointing live runs, I stopped asking my agent to be clever and started forcing it to be safe: risk caps, timeouts, and “market-health” gates that kept the loop alive when the regime wasn’t.

Jun 28, 2020

First Live Runs - Small Size, Big Lessons

Backtests looked amazing. Live PnL didn't. In June 2020 I ran the first real BitMEX live loop at tiny size and learned the most important lesson in trading ML: regime is the boss.

May 31, 2020

Safety Engineering - Kill Switches, Reconciliation, and Failure Recovery

In May 2020, I stop hoping the bot is “fine” and start giving it explicit failure states — stale websockets, missing fills, rate-limits, and the kill switches that keep a live loop honest.

Apr 26, 2020

Chappie Wiring From Trained Policy to Running Process

The moment RL stops being a notebook artifact: load a PPO policy, rebuild the live observation stream, and turn BitMEX into a runtime you can monitor and control.

Mar 29, 2020

Reward Shaping Without Lying - Penalties, Constraints, and the First Real Fixes

In March 2020, I stopped treating reward like a number and started treating it like a contract—pay real costs, punish real risk, and don’t teach the agent to win a video game.

Feb 23, 2020

Evaluation Discipline - Walk-Forward Backtesting Inside the Gym

Training reward was lying to me. So I turned evaluation into a first-class system - chronological splits, deterministic runs, and walk-forward backtests that survive the next dataset.

Jan 26, 2020

bitmex-gym - The Baseline Trading Environment (Where Cheating Starts)

In January 2020 I stop “predicting” and build a Gym environment that turns BitMEX microstructure features into actions, fills, and rewards — and makes every hidden assumption painfully visible.

Dec 29, 2019

From Prediction to Decision - Designing the Trading Environment Contract

I stopped pretending “a good predictor” was the same thing as “a tradable strategy” and designed a Gym-style environment contract that makes cheating obvious and failure modes measurable.

Nov 24, 2019

The 503 Lesson - Outages as a Signal, Not Just a Bug

My first live alpha monitor was “working”… until BitMEX started replying 503 right when the model got excited. That’s when I learned availability is part of market microstructure.

Oct 27, 2019

Live Alpha Monitoring - When the Market Talks Back

I stop treating my alpha model like a notebook artifact and make it sit in the real BitMEX stream. The goal is not trading yet. It is seeing whether my features, normalization, and inference loop survive reality without quietly cheating.

Sep 29, 2019

Deep Silos - Representation Learning That Respects Feature Families

My first serious attempt at making the model “see” microstructure features the way I designed them — grouped, compressed, and only then fused.

Aug 25, 2019

Supervised Baselines - First Alpha Models, First Humbling Curves

I train my first alpha predictor on BitMEX order-book features, and learn why ‘it trains’ is not the same as ‘it works’.

Jul 28, 2019

Defining Alpha Without Cheating - Look-Ahead Labels and Leakage Traps

Before I train anything, I need a label that doesn’t smuggle the future into my dataset.

Jun 30, 2019

Normalization Is a Deployment Problem - Mean/Sigma and Index Diff

In June 2019 I stop treating feature scaling as “preprocessing” and start treating it as part of the production contract - same transforms, same stats, same order — or the live system lies.

May 26, 2019

Feature Engineering, But Make It Microstructure: Liquidity Created/Removed

If the order book is the battlefield, features are the sensors. This month I stop hand-waving and teach my pipeline to measure liquidity being added and removed - in a way I can deploy live.

Apr 28, 2019

Dataset Reality — HDF5 Schema, Missing Data, and “Don’t Lie to Yourself” Rules

In April 2019 I learned that the hardest part of trading ML isn’t the model — it’s the dataset contract. This month is about HDF5, integrity checks, and building rules that stop “good backtests” from lying.

Mar 31, 2019

The Collector - Websockets, Clock Drift, and the First Clean Snapshot

In March 2019 I stop “talking about microstructure” and start collecting it. Websockets drop messages, clocks drift, and the only thing that matters is producing a snapshot I can trust.

Feb 24, 2019

From Microstructure to Features - What the Model Will See

If RL taught me “the state is the contract,” then trading is where that contract becomes painful. This month I map order book microstructure into concrete feature families my models can actually learn from.

Jan 27, 2019

Order Books Are the Battlefield - Matching Engines in Plain English

In 2018 I learned RL inside clean Gym worlds. In 2019 I’m pointing that mindset at BitMEX — where the “environment” is a matching engine and the rewards come with slippage, fees, queue priority, and outages.

Dec 30, 2018

Imitation Learning - GAIL and the Strange Feeling of Learning From Experts

I ended 2018 in a weird place - less “reinforcement,” more “copying.” GAIL taught me that sometimes the fastest path to competence is to borrow behavior first — and ask questions later.

Nov 25, 2018

Sparse Rewards - HER and Learning From What Didn’t Happen

This month RL didn’t fail loudly. It failed quietly. Sparse rewards taught me the most brutal lesson yet - if nothing “good” happens, nothing gets learned — unless you rewrite what counts as experience.

Oct 28, 2018

Stability is a Feature You Have to Design

After DDPG, I stopped thinking of RL instability as a surprise and started treating it like a design constraint. This month I learned why TRPO exists — and why PPO/PPO2 became the practical answer.

Sep 30, 2018

Continuous Control - DDPG and the Seduction of Off-Policy

This month I left “toy” discrete actions and stepped into continuous control. DDPG looked like the perfect deal—until I learned what off-policy really costs.

Aug 26, 2018

Why RL Training Is Unstable (A Catalog of Breakage)

After actor-critic finally felt “trainable,” I hit the next wall - RL doesn’t just fail—it fails in loops. This month is my map of the most common ways it breaks.

Jul 30, 2018

Actor-Critic - The First Time RL Feels Trainable

Policy gradients were honest but noisy. This month I added a critic—and for the first time in RL, training started to feel like something I could actually steer.

Jun 24, 2018

Policy Gradients - Learning Without a Value Crutch

DQN taught me how fragile value learning can be. This month I tried something different - learn the policy directly. No Q-table. No value “crutch.” Just behavior, gradients, and a whole new set of failure modes.

May 27, 2018

Deep Q-Learning - My First Real Baselines Month

This is the month I stopped reading about deep RL and started running it. DQN is simple enough to explain, hard enough to break, and perfect for learning Baselines like an engineer.

Apr 29, 2018

Function Approximation - The Day RL Stopped Being Stable

Tabular RL felt clean because you could see the truth in a table. The moment I replaced the table with a model, RL stopped being a neat algorithm and became a fragile system.

Mar 25, 2018

Tabular RL - When Value Iteration Feels Like Cheating

Tabular RL is the last time reinforcement learning feels clean. Values become literal tables, planning becomes explicit, and the “aha” moments arrive fast.

Feb 25, 2018

Bandits - The First Honest RL Problem

Bandits strip RL down to one tension—explore vs exploit—so I can stop confusing luck with learning and start building real intuition.

Jan 28, 2018

Rewards, Returns, and Why “Learning” Is an Interface Problem

I’m starting 2018 by shifting from deep learning to reinforcement learning. The first lesson isn’t an algorithm — it’s that the data pipeline is the policy itself.

Dec 31, 2017

From Classical ML to Deep Learning - What Actually Changed (and What Didn’t) (and My Next Steps)

A year after finishing Andrew Ng’s classical ML course, I’m trying to separate enduring principles from deep learning-specific techniques—and decide where to go next.

Nov 26, 2017

LSTMs - Engineering Memory into the Network

After vanilla RNNs taught me why gradients collapse through time, LSTMs finally felt like an engineered solution - keep the memory path stable, and control it with gates.

Oct 29, 2017

Vanishing Gradients Strike Back - The Pain of Training RNNs

RNNs looked elegant on paper. Training them exposed the same old enemy—vanishing/exploding gradients—just with “depth in time”.

Sep 24, 2017

Why Sequences Break Everything - Enter Recurrent Neural Networks

Images were hard, but at least they were static. Sequences add “time”, shared weights, and state — and suddenly the assumptions I relied on in 2016 stop holding.

Aug 13, 2017

Pooling, Hierarchies, and What CNNs Are Really Learning

Convolution made CNNs "possible". Pooling and depth made them "useful" - invariance, hierarchies, and feature maps that start to look like learned vision primitives.

Jul 30, 2017

Convolutions - Why CNNs See the World Differently

After months wrestling with training stability, I finally hit the next wall - fully-connected nets don’t “get” images. Convolutions felt like the first time architecture itself encoded domain knowledge.

Jun 25, 2017

Optimization Got Real - Momentum, Learning Rates, and Why Plain Gradient Descent Wasn’t Enough

In 2016, gradient descent felt like “the algorithm.” In deep learning, it’s just the beginning. This month I learned why momentum and careful learning rates are what make training "actually move".

May 28, 2017

Initialization, Scale, and the Fragility of Deep Networks

After learning why gradients vanish, I discovered something even more unsettling - deep networks can fail before training even begins, simply because the starting scale is wrong.

Apr 30, 2017

Activation Functions Are Not a Detail - ReLU Changed Everything

April 2017 — I used to treat activation functions like a minor math choice. Then I saw how one change (ReLU) could decide whether a deep network learns at all.

Mar 26, 2017

Why Deeper Networks Are Harder to Train Than I Expected

I assumed “more layers” would just mean “more power.” Instead I discovered that depth introduces a new failure mode - gradients can disappear (or explode) long before the model learns anything useful.

Feb 26, 2017

Backpropagation Demystified - It’s Just the Chain Rule (But Applied Ruthlessly)

Backprop stopped feeling like magic when I treated it like engineering - track shapes, follow the chain rule, and test gradients like you’d test any critical system.

Jan 29, 2017

From Logistic Regression to Neurons - Rebuilding Intuition from the Perceptron

After finishing Andrew Ng’s Machine Learning course, I start my deep learning journey by revisiting the perceptron and realizing neural networks begin with ideas I already understand.

Dec 29, 2016

Exercise 8 + Course Wrap - Anomaly Detection & Recommenders (and My Next Steps)

I wrapped Andrew Ng’s ML course by building anomaly detection and a simple movie recommender—two patterns that show up everywhere in real systems.

Nov 27, 2016

Exercise 7 - Unsupervised Learning (K-means) + PCA (Compression & Visualization)

Implement K-means clustering, compress an image to 16 colors, then use PCA to reduce dimensions and build eigenfaces.

Oct 30, 2016

Exercise 6 - Support Vector Machines (When a Different Model Just Wins)

I built my first spam classifier with SVMs—learning how C and sigma shape decision boundaries, and why linear SVMs scale surprisingly well.

Sep 25, 2016

Exercise 5 - Debugging ML (Bias/Variance, Learning Curves, and What to Try Next)

The most practical assignment so far - diagnose bias vs variance using learning curves, tune lambda with validation curves, and build a repeatable “next action” playbook.

Aug 28, 2016

Exercise 4 - Neural Networks Learning (Backpropagation Without Tears)

Implement backpropagation for a 2-layer neural network, verify gradients numerically, train on handwritten digits, and hit ~95% accuracy.

Jul 31, 2016

Exercise 3 - One-vs-All + Intro to Neural Networks (Handwritten Digits!)

Build a multi-class digit classifier with one-vs-all logistic regression, then run a small neural network forward pass on the same dataset.

Jun 26, 2016

Regularization - Overfitting in the Real World (and How to Fight It)

Overfitting is what happens when your model memorizes training data. Regularization is the practical tool that keeps it honest.

May 29, 2016

Exercise 2 - Logistic Regression for Classification (My First Real Classifier)

My first real classifier - predict admissions from exam scores with logistic regression, then learn why regularization matters on a non-linear dataset.

Apr 24, 2016

Normal Equation vs Gradient Descent (Choosing Tools Like an Engineer)

Notes — two ways to fit linear regression - iterative gradient descent vs one-shot normal equation. Same goal, different tradeoffs.

Mar 27, 2016

Linear Regression With Multiple Variables (and Why Vectorization Matters)

Notes from Andrew Ng’s ML course — extend linear regression to multiple features, learn feature scaling/mean normalization, and stop writing slow loops by vectorizing everything.

Feb 28, 2016

Exercise 1 - Linear Regression From Scratch

Notes from Andrew Ng’s ML course — plot the food-truck dataset, implement computeCost + gradientDescent in Octave, and build intuition with J(theta) visualizations.

Jan 31, 2016

Why I’m Learning Machine Learning

In 2016, I’m documenting my journey through Andrew Ng’s Machine Learning course—building intuition, writing Octave code, and learning how to think in data.