[{"data":1,"prerenderedAt":2058},["ShallowReactive",2],{"navigation":3,"/blog/the-model-router-era-routing-eval-gates-and-budgets":506,"/blog/the-model-router-era-routing-eval-gates-and-budgets-surround":2054},[4],{"title":5,"path":6,"stem":7,"children":8,"page":505},"Blog","/blog","blog",[9,13,17,21,25,29,33,37,41,45,49,53,57,61,65,69,73,77,81,85,89,93,97,101,105,109,113,117,121,125,129,133,137,141,145,149,153,157,161,165,169,173,177,181,185,189,193,197,201,205,209,213,217,221,225,229,233,237,241,245,249,253,257,261,265,269,273,277,281,285,289,293,297,301,305,309,313,317,321,325,329,333,337,341,345,349,353,357,361,365,369,373,377,381,385,389,393,397,401,405,409,413,417,421,425,429,433,437,441,445,449,453,457,461,465,469,473,477,481,485,489,493,497,501],{"title":10,"path":11,"stem":12},"Activation Functions Are Not a Detail - ReLU Changed Everything","/blog/activation-functions-are-not-a-detail","blog/activation-functions-are-not-a-detail",{"title":14,"path":15,"stem":16},"Actor-Critic - The First Time RL Feels Trainable","/blog/actor-critic-the-first-time-rl-feels-trainable","blog/actor-critic-the-first-time-rl-feels-trainable",{"title":18,"path":19,"stem":20},"Agent Evals as CI - From Prompt Tests to Scenario Harnesses and Red Teams","/blog/agent-evals-as-ci-from-prompt-tests-to-scenario-harnesses-and-red-teams","blog/agent-evals-as-ci-from-prompt-tests-to-scenario-harnesses-and-red-teams",{"title":22,"path":23,"stem":24},"Agent Runtimes Emerge: SDKs, orchestration primitives, and observability","/blog/agent-runtimes-emerge-sdks-orchestration-primitives-and-observability","blog/agent-runtimes-emerge-sdks-orchestration-primitives-and-observability",{"title":26,"path":27,"stem":28},"Agentic AI Is Becoming a Cybersecurity Problem","/blog/agentic-ai-is-becoming-a-cybersecurity-problem","blog/agentic-ai-is-becoming-a-cybersecurity-problem",{"title":30,"path":31,"stem":32},"Agents as Distributed Systems: outbox, sagas, and “eventually correct” workflows","/blog/agents-as-distributed-systems-outbox-sagas-eventually-correct-workflows","blog/agents-as-distributed-systems-outbox-sagas-eventually-correct-workflows",{"title":34,"path":35,"stem":36},"AJAX → Fetch → GraphQL → tRPC: Choosing Your Data Boundary","/blog/ajax-fetch-graphql-trpc-choosing-your-data-boundary","blog/ajax-fetch-graphql-trpc-choosing-your-data-boundary",{"title":38,"path":39,"stem":40},"Exercise 8 + Course Wrap - Anomaly Detection & Recommenders (and My Next Steps)","/blog/anomaly-detection-and-recommenders","blog/anomaly-detection-and-recommenders",{"title":42,"path":43,"stem":44},"API Evolution at Scale: Compatibility, Contracts, and Consumer-Driven Testing","/blog/api-evolution-at-scale-compatibility-contracts-consumer-driven-testing","blog/api-evolution-at-scale-compatibility-contracts-consumer-driven-testing",{"title":46,"path":47,"stem":48},"Backends: Frameworks Don’t Matter Until They Do (Node, Java, .NET, Go, Python)","/blog/backends-frameworks-dont-matter-until-they-do","blog/backends-frameworks-dont-matter-until-they-do",{"title":50,"path":51,"stem":52},"Backpropagation Demystified - It’s Just the Chain Rule (But Applied Ruthlessly)","/blog/backpropagation-demystified","blog/backpropagation-demystified",{"title":54,"path":55,"stem":56},"Bandits - The First Honest RL Problem","/blog/bandits-the-first-honest-rl-problem","blog/bandits-the-first-honest-rl-problem",{"title":58,"path":59,"stem":60},"Batch Training & Evaluation Again: Promising Results That Survive Scrutiny","/blog/batch-training-evaluation-again-promising-results-that-survive-scrutiny","blog/batch-training-evaluation-again-promising-results-that-survive-scrutiny",{"title":62,"path":63,"stem":64},"bitmex-gym - The Baseline Trading Environment (Where Cheating Starts)","/blog/bitmex-gym-baseline-trading-environment-where-cheating-starts","blog/bitmex-gym-baseline-trading-environment-where-cheating-starts",{"title":66,"path":67,"stem":68},"bitmex-management-gym: Position Sizing and the First Risk-Aware Agent","/blog/bitmex-management-gym-position-sizing-first-risk-aware-agent","blog/bitmex-management-gym-position-sizing-first-risk-aware-agent",{"title":70,"path":71,"stem":72},"Browser Reality: The Event Loop, Rendering, and Why UX Bugs Look Like Backend Bugs","/blog/browser-reality-event-loop-rendering-ux-bugs-backend-bugs","blog/browser-reality-event-loop-rendering-ux-bugs-backend-bugs",{"title":74,"path":75,"stem":76},"Caching Without Folklore: Redis, CDNs, and the Two Hard Things","/blog/caching-without-folklore-redis-cdns-and-the-two-hard-things","blog/caching-without-folklore-redis-cdns-and-the-two-hard-things",{"title":78,"path":79,"stem":80},"Capstone: Build a System That Can Survive (Reference Architecture + Decision Log)","/blog/capstone-build-a-system-that-can-survive","blog/capstone-build-a-system-that-can-survive",{"title":82,"path":83,"stem":84},"Chappie Wiring From Trained Policy to Running Process","/blog/chappie-wiring-from-trained-policy-to-running-process","blog/chappie-wiring-from-trained-policy-to-running-process",{"title":86,"path":87,"stem":88},"CI/CD as Architecture: Testing Pyramids, Pipelines, and Rollout Safety","/blog/ci-cd-as-architecture-testing-pyramids-pipelines-rollout-safety","blog/ci-cd-as-architecture-testing-pyramids-pipelines-rollout-safety",{"title":90,"path":91,"stem":92},"Cloud Infrastructure Without the Fanaticism: IaaS, PaaS, Serverless, Kubernetes","/blog/cloud-infrastructure-without-the-religion","blog/cloud-infrastructure-without-the-religion",{"title":94,"path":95,"stem":96},"Computer-Use Agents in Production: sandboxes, VMs, and UI-action safety","/blog/computer-use-agents-in-production-sandboxes-vms-ui-action-safety","blog/computer-use-agents-in-production-sandboxes-vms-ui-action-safety",{"title":98,"path":99,"stem":100},"Constraints That Teach: Risk Caps, Timeouts, and Surviving Bad Regimes","/blog/constraints-that-teach-risk-caps-timeouts-surviving-bad-regimes","blog/constraints-that-teach-risk-caps-timeouts-surviving-bad-regimes",{"title":102,"path":103,"stem":104},"Containers, Docker, and the Discipline of Reproducibility","/blog/containers-docker-and-the-discipline-of-reproducibility","blog/containers-docker-and-the-discipline-of-reproducibility",{"title":106,"path":107,"stem":108},"Context Assembly as a Subsystem: Summaries, State, and Token Budgets","/blog/context-assembly-as-a-subsystem-summaries-state-and-token-budgets","blog/context-assembly-as-a-subsystem-summaries-state-and-token-budgets",{"title":110,"path":111,"stem":112},"Continuous Control - DDPG and the Seduction of Off-Policy","/blog/continuous-control-ddpg-and-the-seduction-of-off-policy","blog/continuous-control-ddpg-and-the-seduction-of-off-policy",{"title":114,"path":115,"stem":116},"Convolutions - Why CNNs See the World Differently","/blog/convolutions-why-cnns-see-the-world-differently","blog/convolutions-why-cnns-see-the-world-differently",{"title":118,"path":119,"stem":120},"Cost as a First-Class Constraint: FinOps for Architects","/blog/cost-as-a-first-class-constraint-finops-for-architects","blog/cost-as-a-first-class-constraint-finops-for-architects",{"title":122,"path":123,"stem":124},"DALL·E: How Text Became Images (and Why It Changed Everything)","/blog/dalle-how-text-became-images-and-why-it-changed-everything","blog/dalle-how-text-became-images-and-why-it-changed-everything",{"title":126,"path":127,"stem":128},"Data Engineering for Product Teams: OLTP vs OLAP, Streaming, and Truth","/blog/data-engineering-for-product-teams-oltp-vs-olap-streaming-and-truth","blog/data-engineering-for-product-teams-oltp-vs-olap-streaming-and-truth",{"title":130,"path":131,"stem":132},"Data Stores 101 for Architects: SQL, NoSQL, and the Shape of Consistency","/blog/data-stores-101-for-architects-sql-nosql-consistency","blog/data-stores-101-for-architects-sql-nosql-consistency",{"title":134,"path":135,"stem":136},"Dataset Reality — HDF5 Schema, Missing Data, and “Don’t Lie to Yourself” Rules","/blog/dataset-reality-hdf5-schema-missing-data","blog/dataset-reality-hdf5-schema-missing-data",{"title":138,"path":139,"stem":140},"Exercise 5 - Debugging ML (Bias/Variance, Learning Curves, and What to Try Next)","/blog/debugging-ml-bias-variance","blog/debugging-ml-bias-variance",{"title":142,"path":143,"stem":144},"Deep Q-Learning - My First Real Baselines Month","/blog/deep-q-learning-my-first-real-baselines-month","blog/deep-q-learning-my-first-real-baselines-month",{"title":146,"path":147,"stem":148},"Deep Silos in RL: Architecture as Stability (and the First LSTM Variant)","/blog/deep-silos-in-rl-architecture-as-stability","blog/deep-silos-in-rl-architecture-as-stability",{"title":150,"path":151,"stem":152},"Deep Silos - Representation Learning That Respects Feature Families","/blog/deep-silos-representation-learning-feature-families","blog/deep-silos-representation-learning-feature-families",{"title":154,"path":155,"stem":156},"Defining Alpha Without Cheating - Look-Ahead Labels and Leakage Traps","/blog/defining-alpha-without-cheating","blog/defining-alpha-without-cheating",{"title":158,"path":159,"stem":160},"Dissecting ChatGPT: The Product Architecture Around the Model","/blog/dissecting-chatgpt-the-product-architecture-around-the-model","blog/dissecting-chatgpt-the-product-architecture-around-the-model",{"title":162,"path":163,"stem":164},"Distributed Data: Transactions, Outbox, Sagas, and “Eventually Correct”","/blog/distributed-data-transactions-outbox-sagas-eventually-correct","blog/distributed-data-transactions-outbox-sagas-eventually-correct",{"title":166,"path":167,"stem":168},"Why Sequences Break Everything - Enter Recurrent Neural Networks","/blog/enter-recurrent-neural-networks","blog/enter-recurrent-neural-networks",{"title":170,"path":171,"stem":172},"Evaluation Discipline - Walk-Forward Backtesting Inside the Gym","/blog/evaluation-discipline-walk-forward-backtesting-inside-gym","blog/evaluation-discipline-walk-forward-backtesting-inside-gym",{"title":174,"path":175,"stem":176},"Feature Engineering, But Make It Microstructure: Liquidity Created/Removed","/blog/feature-engineering-microstructure-liquidity-created-removed","blog/feature-engineering-microstructure-liquidity-created-removed",{"title":178,"path":179,"stem":180},"First Live Runs - Small Size, Big Lessons","/blog/first-live-runs-small-size-big-lessons","blog/first-live-runs-small-size-big-lessons",{"title":182,"path":183,"stem":184},"From Logistic Regression to Neurons - Rebuilding Intuition from the Perceptron","/blog/from-logistic-regression-to-neurons","blog/from-logistic-regression-to-neurons",{"title":186,"path":187,"stem":188},"From Microstructure to Features - What the Model Will See","/blog/from-microstructure-to-features-what-the-model-will-see","blog/from-microstructure-to-features-what-the-model-will-see",{"title":190,"path":191,"stem":192},"From Classical ML to Deep Learning - What Actually Changed (and What Didn’t) (and My Next Steps)","/blog/from-ml-to-deep-learning-retrospective","blog/from-ml-to-deep-learning-retrospective",{"title":194,"path":195,"stem":196},"From Prediction to Decision - Designing the Trading Environment Contract","/blog/from-prediction-to-decision-trading-environment-contract","blog/from-prediction-to-decision-trading-environment-contract",{"title":198,"path":199,"stem":200},"From Research Rig to System: 2020 Postmortem and the Real Amazing Result","/blog/from-research-rig-to-system-2020-postmortem","blog/from-research-rig-to-system-2020-postmortem",{"title":202,"path":203,"stem":204},"Frontend Systems: Routing, State, Forms, and the “Boring Stack” That Scales","/blog/frontend-systems-routing-state-forms-boring-stack","blog/frontend-systems-routing-state-forms-boring-stack",{"title":206,"path":207,"stem":208},"Function Approximation - The Day RL Stopped Being Stable","/blog/function-approximation-the-day-rl-stopped-being-stable","blog/function-approximation-the-day-rl-stopped-being-stable",{"title":210,"path":211,"stem":212},"GPAI Obligations Begin: What Changes for Model Providers and Enterprises","/blog/gpai-obligations-begin-what-changes-for-model-providers-and-enterprises","blog/gpai-obligations-begin-what-changes-for-model-providers-and-enterprises",{"title":214,"path":215,"stem":216},"Hallucinations: A Probabilistic Failure Mode, Not a Moral Defect","/blog/hallucinations-a-probabilistic-failure-mode-not-a-moral-defect","blog/hallucinations-a-probabilistic-failure-mode-not-a-moral-defect",{"title":218,"path":219,"stem":220},"HTTP as a Distributed Systems API (Without the Buzzwords)","/blog/http-as-a-distributed-systems-api-without-the-buzzwords","blog/http-as-a-distributed-systems-api-without-the-buzzwords",{"title":222,"path":223,"stem":224},"Imitation Learning - GAIL and the Strange Feeling of Learning From Experts","/blog/imitation-learning-gail-and-learning-from-experts","blog/imitation-learning-gail-and-learning-from-experts",{"title":226,"path":227,"stem":228},"Incident Response and Resilience: Designing for Failure, Not Hope","/blog/incident-response-and-resilience-designing-for-failure-not-hope","blog/incident-response-and-resilience-designing-for-failure-not-hope",{"title":230,"path":231,"stem":232},"Initialization, Scale, and the Fragility of Deep Networks","/blog/initialization-scale-fragility-of-deep-networks","blog/initialization-scale-fragility-of-deep-networks",{"title":234,"path":235,"stem":236},"Instruction Tuning: Turning a Completion Engine into an Assistant","/blog/instruction-tuning-turning-a-completion-engine-into-an-assistant","blog/instruction-tuning-turning-a-completion-engine-into-an-assistant",{"title":238,"path":239,"stem":240},"Exercise 3 - One-vs-All + Intro to Neural Networks (Handwritten Digits!)","/blog/intro-to-neural-networks","blog/intro-to-neural-networks",{"title":242,"path":243,"stem":244},"Exercise 1 - Linear Regression From Scratch","/blog/linear-regression-from-scratch","blog/linear-regression-from-scratch",{"title":246,"path":247,"stem":248},"Linear Regression With Multiple Variables (and Why Vectorization Matters)","/blog/linear-regression-with-multiple-vars","blog/linear-regression-with-multiple-vars",{"title":250,"path":251,"stem":252},"Live Alpha Monitoring - When the Market Talks Back","/blog/live-alpha-monitoring-when-market-talks-back","blog/live-alpha-monitoring-when-market-talks-back",{"title":254,"path":255,"stem":256},"Exercise 2 - Logistic Regression for Classification (My First Real Classifier)","/blog/logistic-regression-for-classification","blog/logistic-regression-for-classification",{"title":258,"path":259,"stem":260},"Long Context Isn’t Memory: When to Stuff, When to Retrieve","/blog/long-context-isnt-memory-when-to-stuff-when-to-retrieve","blog/long-context-isnt-memory-when-to-stuff-when-to-retrieve",{"title":262,"path":263,"stem":264},"LSTMs - Engineering Memory into the Network","/blog/lstms-engineering-memory-into-the-network","blog/lstms-engineering-memory-into-the-network",{"title":266,"path":267,"stem":268},"Maker Trades as a Strategy: When Fees Become a Reward Signal","/blog/maker-trades-fees-reward-signal","blog/maker-trades-fees-reward-signal",{"title":270,"path":271,"stem":272},"Microservices vs Modular Monolith: The “When” and the “How”","/blog/microservices-vs-modular-monolith-when-and-how","blog/microservices-vs-modular-monolith-when-and-how",{"title":274,"path":275,"stem":276},"Midjourney and the Product Loop: Why Some Generators Feel Magical","/blog/midjourney-and-the-product-loop-why-some-generators-feel-magical","blog/midjourney-and-the-product-loop-why-some-generators-feel-magical",{"title":278,"path":279,"stem":280},"Model Selection Becomes Architecture: Routing, Budgets, and Capability Tiers","/blog/model-selection-becomes-architecture-routing-budgets-and-capability-tiers","blog/model-selection-becomes-architecture-routing-budgets-and-capability-tiers",{"title":282,"path":283,"stem":284},"Multi-Agent Systems Without Chaos: supervisors, specialists, and coordination contracts","/blog/multi-agent-systems-without-chaos-supervisors-specialists-and-coordination-contracts","blog/multi-agent-systems-without-chaos-supervisors-specialists-and-coordination-contracts",{"title":286,"path":287,"stem":288},"Multimodal Changes UX: designing text+vision+audio systems","/blog/multimodal-changes-ux-designing-text-vision-audio-systems","blog/multimodal-changes-ux-designing-text-vision-audio-systems",{"title":290,"path":291,"stem":292},"Exercise 4 - Neural Networks Learning (Backpropagation Without Tears)","/blog/neural-networks-learning-backpropagation","blog/neural-networks-learning-backpropagation",{"title":294,"path":295,"stem":296},"Normal Equation vs Gradient Descent (Choosing Tools Like an Engineer)","/blog/normal-equation-vs-gradient-descent","blog/normal-equation-vs-gradient-descent",{"title":298,"path":299,"stem":300},"Normalization Is a Deployment Problem - Mean/Sigma and Index Diff","/blog/normalization-is-a-deployment-problem","blog/normalization-is-a-deployment-problem",{"title":302,"path":303,"stem":304},"Observability that Works: Logs, Metrics, Traces, and SLO Thinking","/blog/observability-that-works-logs-metrics-traces-and-slo-thinking","blog/observability-that-works-logs-metrics-traces-and-slo-thinking",{"title":306,"path":307,"stem":308},"Open Weights in Production: evaluation, licensing, and guardrails","/blog/open-weights-in-production-evaluation-licensing-and-guardrails","blog/open-weights-in-production-evaluation-licensing-and-guardrails",{"title":310,"path":311,"stem":312},"OpenClaw: A Viral Agent, a Skills Ecosystem, and the Supply-Chain Reality Check","/blog/openclaw-a-viral-agent-and-the-supply-chain-reality-check","blog/openclaw-a-viral-agent-and-the-supply-chain-reality-check",{"title":314,"path":315,"stem":316},"Optimization Got Real - Momentum, Learning Rates, and Why Plain Gradient Descent Wasn’t Enough","/blog/optimization-got-real-momentumand-learning-rates","blog/optimization-got-real-momentumand-learning-rates",{"title":318,"path":319,"stem":320},"Order Books Are the Battlefield - Matching Engines in Plain English","/blog/order-books-are-the-battlefield","blog/order-books-are-the-battlefield",{"title":322,"path":323,"stem":324},"Performance Engineering End-to-End: From TTFB to Tail Latency","/blog/performance-engineering-end-to-end-from-ttfb-to-tail-latency","blog/performance-engineering-end-to-end-from-ttfb-to-tail-latency",{"title":326,"path":327,"stem":328},"Policy Gradients - Learning Without a Value Crutch","/blog/policy-gradients-learning-without-a-value-crutch","blog/policy-gradients-learning-without-a-value-crutch",{"title":330,"path":331,"stem":332},"Pooling, Hierarchies, and What CNNs Are Really Learning","/blog/pooling-hierarchies-and-cnns","blog/pooling-hierarchies-and-cnns",{"title":334,"path":335,"stem":336},"Pretraining Is Compression: Tokens, Datasets, and Emergent Skill","/blog/pretraining-is-compression-tokens-datasets-and-emergent-skill","blog/pretraining-is-compression-tokens-datasets-and-emergent-skill",{"title":338,"path":339,"stem":340},"Prism and the Architecture of Artifact-Native AI for Science","/blog/prism-and-the-architecture-of-artifact-native-ai-for-science","blog/prism-and-the-architecture-of-artifact-native-ai-for-science",{"title":342,"path":343,"stem":344},"Prompting is Not Programming: Contracts, Schemas, and Failure Budgets","/blog/prompting-is-not-programming-contracts-schemas-failure-budgets","blog/prompting-is-not-programming-contracts-schemas-failure-budgets",{"title":346,"path":347,"stem":348},"Queues, Retries, and Idempotency: Engineering Reality in Async Systems","/blog/queues-retries-and-idempotency-engineering-reality-in-async-systems","blog/queues-retries-and-idempotency-engineering-reality-in-async-systems",{"title":350,"path":351,"stem":352},"RAG Done Right: Knowledge, Grounding, and Evaluation That Isn’t Vibes","/blog/rag-done-right-knowledge-grounding-and-evaluation-that-isnt-vibes","blog/rag-done-right-knowledge-grounding-and-evaluation-that-isnt-vibes",{"title":354,"path":355,"stem":356},"RAG You Can Evaluate: retrieval pipelines, reranking, citations, and truth boundaries","/blog/rag-you-can-evaluate-retrieval-pipelines-reranking-citations-truth-boundaries","blog/rag-you-can-evaluate-retrieval-pipelines-reranking-citations-truth-boundaries",{"title":358,"path":359,"stem":360},"React as an Architecture Tool: Components, Hooks, and the Cost of Re-rendering","/blog/react-as-an-architecture-tool-components-hooks-cost-of-rerendering","blog/react-as-an-architecture-tool-components-hooks-cost-of-rerendering",{"title":362,"path":363,"stem":364},"Real-Time Agents: streaming, barge-in, and session state that doesn’t collapse","/blog/real-time-agents-streaming-barge-in-session-state-that-doesnt-collapse","blog/real-time-agents-streaming-barge-in-session-state-that-doesnt-collapse",{"title":366,"path":367,"stem":368},"Reasoning Budgets: fast/slow paths, verification, and when to “think longer”","/blog/reasoning-budgets-fast-slow-paths-verification-think-longer","blog/reasoning-budgets-fast-slow-paths-verification-think-longer",{"title":370,"path":371,"stem":372},"Reference Architecture v2: the Operable Agent Platform","/blog/reference-architecture-v2-the-operable-agent-platform","blog/reference-architecture-v2-the-operable-agent-platform",{"title":374,"path":375,"stem":376},"Regularization - Overfitting in the Real World (and How to Fight It)","/blog/regularization-overfitting-in-the-real-world","blog/regularization-overfitting-in-the-real-world",{"title":378,"path":379,"stem":380},"Regulation as Architecture: Turning the EU AI Act into Controls and Evidence","/blog/regulation-as-architecture-eu-ai-act-controls-evidence","blog/regulation-as-architecture-eu-ai-act-controls-evidence",{"title":382,"path":383,"stem":384},"RESTful Design That Survives: Resources, Boundaries, and Versioning","/blog/restful-design-that-survives-resources-boundaries-and-versioning","blog/restful-design-that-survives-resources-boundaries-and-versioning",{"title":386,"path":387,"stem":388},"Reward Shaping Without Lying - Penalties, Constraints, and the First Real Fixes","/blog/reward-shaping-without-lying","blog/reward-shaping-without-lying",{"title":390,"path":391,"stem":392},"Rewards, Returns, and Why “Learning” Is an Interface Problem","/blog/rewards-returns-learning-is-an-interface-problem","blog/rewards-returns-learning-is-an-interface-problem",{"title":394,"path":395,"stem":396},"RLHF: Stabilizing Behavior with Preferences (Alignment as Control)","/blog/rlhf-stabilizing-behavior-with-preferences-alignment-as-control","blog/rlhf-stabilizing-behavior-with-preferences-alignment-as-control",{"title":398,"path":399,"stem":400},"Safety Engineering - Kill Switches, Reconciliation, and Failure Recovery","/blog/safety-engineering-kill-switches-reconciliation-failure-recovery","blog/safety-engineering-kill-switches-reconciliation-failure-recovery",{"title":402,"path":403,"stem":404},"Security for Agent Connectors: least privilege, injection resistance, and safe toolchains","/blog/security-for-agent-connectors-least-privilege-injection-resistance-and-safe-toolchains","blog/security-for-agent-connectors-least-privilege-injection-resistance-and-safe-toolchains",{"title":406,"path":407,"stem":408},"Security for Builders: Threat Modeling and Secure-by-Default Systems","/blog/security-for-builders-threat-modeling-and-secure-by-default-systems","blog/security-for-builders-threat-modeling-and-secure-by-default-systems",{"title":410,"path":411,"stem":412},"Software in the Age of Probabilistic Components","/blog/software-in-the-age-of-probabilistic-components","blog/software-in-the-age-of-probabilistic-components",{"title":414,"path":415,"stem":416},"Sparse Rewards - HER and Learning From What Didn’t Happen","/blog/sparse-rewards-her-and-learning-from-what-didnt-happen","blog/sparse-rewards-her-and-learning-from-what-didnt-happen",{"title":418,"path":419,"stem":420},"Stability is a Feature You Have to Design","/blog/stability-is-a-feature-you-have-to-design","blog/stability-is-a-feature-you-have-to-design",{"title":422,"path":423,"stem":424},"Standards for the Agent Ecosystem: connectors, protocols, and MCP","/blog/standards-for-the-agent-ecosystem-connectors-protocols-and-mcp","blog/standards-for-the-agent-ecosystem-connectors-protocols-and-mcp",{"title":426,"path":427,"stem":428},"Supervised Baselines - First Alpha Models, First Humbling Curves","/blog/supervised-baselines-first-alpha-models","blog/supervised-baselines-first-alpha-models",{"title":430,"path":431,"stem":432},"Exercise 6 - Support Vector Machines (When a Different Model Just Wins)","/blog/support-vector-machines","blog/support-vector-machines",{"title":434,"path":435,"stem":436},"Tabular RL - When Value Iteration Feels Like Cheating","/blog/tabular-rl-when-value-iteration-feels-like-cheating","blog/tabular-rl-when-value-iteration-feels-like-cheating",{"title":438,"path":439,"stem":440},"The 1M-Token Era: how long context changes retrieval economics and system design","/blog/the-1m-token-era-long-context-retrieval-economics-and-system-design","blog/the-1m-token-era-long-context-retrieval-economics-and-system-design",{"title":442,"path":443,"stem":444},"The 503 Lesson - Outages as a Signal, Not Just a Bug","/blog/the-503-lesson-outages-as-signal-not-just-a-bug","blog/the-503-lesson-outages-as-signal-not-just-a-bug",{"title":446,"path":447,"stem":448},"The Collector - Websockets, Clock Drift, and the First Clean Snapshot","/blog/the-collector-websockets-clock-drift-and-the-first-clean-snapshot","blog/the-collector-websockets-clock-drift-and-the-first-clean-snapshot",{"title":450,"path":451,"stem":452},"The Compliance Cliff: prohibited practices and governance controls that actually ship","/blog/the-compliance-cliff-prohibited-practices-and-governance-controls-that-actually-ship","blog/the-compliance-cliff-prohibited-practices-and-governance-controls-that-actually-ship",{"title":454,"path":455,"stem":456},"The Connector Ecosystem: MCP adoption patterns, versioning, and governance","/blog/the-connector-ecosystem-mcp-adoption-patterns-versioning-and-governance","blog/the-connector-ecosystem-mcp-adoption-patterns-versioning-and-governance",{"title":458,"path":459,"stem":460},"The Model Router Era","/blog/the-model-router-era-routing-eval-gates-and-budgets","blog/the-model-router-era-routing-eval-gates-and-budgets",{"title":462,"path":463,"stem":464},"Vanishing Gradients Strike Back - The Pain of Training RNNs","/blog/the-pain-of-training-rnns","blog/the-pain-of-training-rnns",{"title":466,"path":467,"stem":468},"Tool Use and Agents: When the Model Becomes a Workflow Engine","/blog/tool-use-and-agents-when-the-model-becomes-a-workflow-engine","blog/tool-use-and-agents-when-the-model-becomes-a-workflow-engine",{"title":470,"path":471,"stem":472},"Tool Use with Open Models: function calling, sandboxes, and “capability boundaries”","/blog/tool-use-with-open-models-function-calling-sandboxes-capability-boundaries","blog/tool-use-with-open-models-function-calling-sandboxes-capability-boundaries",{"title":474,"path":475,"stem":476},"Transformers: Attention as an Engineering Breakthrough (Not a Math Flex)","/blog/transformers-attention-as-an-engineering-breakthrough","blog/transformers-attention-as-an-engineering-breakthrough",{"title":478,"path":479,"stem":480},"Exercise 7 - Unsupervised Learning (K-means) + PCA (Compression & Visualization)","/blog/unsupervised-learning-and-compression","blog/unsupervised-learning-and-compression",{"title":482,"path":483,"stem":484},"Voice Agents You Can Operate: reliability, caching, latency, and human handoff","/blog/voice-agents-you-can-operate-reliability-caching-latency-human-handoff","blog/voice-agents-you-can-operate-reliability-caching-latency-human-handoff",{"title":486,"path":487,"stem":488},"The Web's \"Compression Algorithm\": Static → Web 2.0 → SPA → SSR/Edge","/blog/webs-compression-algorithm-static-web2-spa-ssr-edge","blog/webs-compression-algorithm-static-web2-spa-ssr-edge",{"title":490,"path":491,"stem":492},"Why Deeper Networks Are Harder to Train Than I Expected","/blog/why-deeper-networks-are-harder-to-train","blog/why-deeper-networks-are-harder-to-train",{"title":494,"path":495,"stem":496},"Why I’m Learning Machine Learning","/blog/why-im-learning-machine-learning","blog/why-im-learning-machine-learning",{"title":498,"path":499,"stem":500},"Why NLP Was Hard: RNN Pain, Vanishing Gradients, and the Limits of “Memory”","/blog/why-nlp-was-hard-rnn-pain-vanishing-gradients-limits-of-memory","blog/why-nlp-was-hard-rnn-pain-vanishing-gradients-limits-of-memory",{"title":502,"path":503,"stem":504},"Why RL Training Is Unstable (A Catalog of Breakage)","/blog/why-rl-training-is-unstable-a-catalog-of-breakage","blog/why-rl-training-is-unstable-a-catalog-of-breakage",false,{"id":507,"title":458,"author":508,"body":512,"date":2042,"description":2043,"extension":2044,"image":2045,"meta":2046,"minRead":1090,"navigation":2048,"path":459,"seo":2049,"sitemap":2050,"stem":460,"__hash__":2053},"blog/blog/the-model-router-era-routing-eval-gates-and-budgets.md",{"name":509,"avatar":510},"Axel Domingues",{"src":511,"alt":509},"/images/axel-domingues.avif",{"type":513,"value":514,"toc":2017},"minimark",[515,519,527,530,536,539,542,558,564,570,573,586,608,648,651,656,659,719,722,727,729,733,739,742,745,765,788,793,1267,1270,1272,1276,1279,1282,1392,1394,1398,1401,1404,1410,1424,1438,1442,1445,1462,1465,1482,1505,1507,1511,1514,1517,1543,1547,1598,1601,1603,1607,1610,1613,1627,1630,1646,1650,1676,1685,1687,1691,1694,1699,1704,1766,1769,1771,1775,1778,1781,1845,1847,1851,1854,1857,1883,1885,1889,1937,1939,1943,2013],[516,517,518],"p",{},"April’s hot topic wasn’t “a new model is better.”",[516,520,521,522,526],{},"It was the operational consequence of ",[523,524,525],"strong",{},"too many good models arriving too fast",".",[516,528,529],{},"When releases land weekly (and pricing, latency, tool use, and safety behavior shift each time), the old decision:",[531,532,533],"blockquote",{},[516,534,535],{},"“Which model does our product use?”",[516,537,538],{},"…stops making sense.",[516,540,541],{},"The new decision is runtime:",[531,543,544],{},[516,545,546,547,551,552,554,555,557],{},"“Which model should handle ",[548,549,550],"em",{},"this"," request, under ",[548,553,550],{}," budget, with ",[548,556,550],{}," security posture?”",[516,559,560,561,526],{},"That’s the ",[523,562,563],{},"model router era",[516,565,566,567,526],{},"You’re not buying a model.\nYou’re operating a ",[523,568,569],{},"policy-controlled inference fabric",[516,571,572],{},"This month’s teaching goal is simple:",[574,575,576,580,583],"ul",{},[577,578,579],"li",{},"what a router really is (and isn’t),",[577,581,582],{},"how to gate model upgrades with evals,",[577,584,585],{},"and how to turn cost + security into first-class routing constraints.",[587,588,589,592,597],"note",{},[516,590,591],{},"A router is not “A/B testing, but for models.”",[531,593,594],{},[516,595,596],{},"A router is a control plane:",[574,598,599,602,605],{},[577,600,601],{},"it selects a provider/model/version",[577,603,604],{},"under budgets and policies",[577,606,607],{},"and it records decisions so you can debug outcomes.",[609,610,611,619,630,641],"card-group",{},[612,613,616],"card",{"icon":614,"title":615},"i-lucide-timer","Why now",[516,617,618],{},"Frontier releases are frequent enough that “one model everywhere” becomes fragile, expensive, and hard to operate.",[612,620,623],{"icon":621,"title":622},"i-lucide-waypoints","The real product",[516,624,625,626,629],{},"The router: ",[523,627,628],{},"policy + budgets + eval gates + fallbacks"," — the thing you can defend in an incident review.",[612,631,634],{"icon":632,"title":633},"i-lucide-wrench","The engineering job",[516,635,636,637,640],{},"Build a routing pipeline that is ",[523,638,639],{},"measurable, reversible, and safe"," under failures and drift.",[612,642,645],{"icon":643,"title":644},"i-lucide-triangle-alert","The trap",[516,646,647],{},"Routing without governance becomes “randomness at scale”: inconsistent behavior, surprise bills, and untraceable failures.",[649,650],"hr",{},[652,653,655],"h2",{"id":654},"the-world-that-created-routers","The world that created routers",[516,657,658],{},"Three forces converged:",[660,661,662,681,700],"ol",{},[577,663,664,667],{},[523,665,666],{},"Capability is now multi-dimensional",[574,668,669,672,675,678],{},[577,670,671],{},"one model is best at code",[577,673,674],{},"another is best at reasoning-style tasks",[577,676,677],{},"another is best at low-latency chat",[577,679,680],{},"another is best at long context",[577,682,683,686],{},[523,684,685],{},"Cost is no longer a rounding error",[574,687,688,691,694,697],{},[577,689,690],{},"bigger contexts",[577,692,693],{},"longer outputs",[577,695,696],{},"tool calls",[577,698,699],{},"retries",[577,701,702,705],{},[523,703,704],{},"Security posture varies by request",[574,706,707,710,713,716],{},[577,708,709],{},"what data is in the prompt?",[577,711,712],{},"what tools will be invoked?",[577,714,715],{},"what environment is allowed?",[577,717,718],{},"what audit requirements apply?",[516,720,721],{},"So the router becomes the only stable abstraction:",[531,723,724],{},[516,725,726],{},"turn “model choice” into a deterministic policy decision you can observe and roll back.",[649,728],{},[652,730,732],{"id":731},"what-a-model-router-is-and-what-it-isnt","What a model router is (and what it isn’t)",[516,734,735,736,526],{},"A router is a component that returns a ",[523,737,738],{},"routing decision",[516,740,741],{},"Not “the answer.”\nNot “a prompt.”\nA decision.",[516,743,744],{},"At minimum, it chooses:",[574,746,747,750,753,756,759,762],{},[577,748,749],{},"provider",[577,751,752],{},"model",[577,754,755],{},"version (or alias)",[577,757,758],{},"tool permissions",[577,760,761],{},"budgets (token cap, time cap, tool cap)",[577,763,764],{},"safety mode (and escalation rules)",[609,766,767,781],{},[612,768,771],{"icon":769,"title":770},"i-lucide-file-code","Router output",[516,772,773,774,777,778,526],{},"A small structured object describing ",[548,775,776],{},"where"," to run and ",[548,779,780],{},"under what constraints",[612,782,785],{"icon":783,"title":784},"i-lucide-inbox","Router input",[516,786,787],{},"Request metadata: user tier, risk class, latency SLO, context size, tool needs, compliance constraints.",[789,790,792],"h3",{"id":791},"a-concrete-decision-object-conceptual","A concrete decision object (conceptual)",[794,795,800],"pre",{"className":796,"code":797,"language":798,"meta":799,"style":799},"language-json shiki shiki-themes material-theme-lighter material-theme material-theme-palenight","{\n  \"route\": {\n    \"provider\": \"openai\",\n    \"model\": \"gpt-5.5\",\n    \"alias\": \"prod-stable\",\n    \"region\": \"eu\"\n  },\n  \"budgets\": {\n    \"max_input_tokens\": 12000,\n    \"max_output_tokens\": 900,\n    \"max_tool_calls\": 5,\n    \"deadline_ms\": 9000\n  },\n  \"policy\": {\n    \"risk_tier\": \"medium\",\n    \"tools_allowed\": [\"search\", \"kb_retrieve\", \"github_read\"],\n    \"requires_approval_for\": [\"email_send\", \"deploy\"]\n  },\n  \"fallbacks\": [\n    {\"provider\": \"openai\", \"model\": \"gpt-5.4\", \"when\": \"timeout_or_5xx\"},\n    {\"provider\": \"anthropic\", \"model\": \"claude-sonnet\", \"when\": \"rate_limit\"}\n  ]\n}\n","json","",[801,802,803,812,831,856,876,897,917,923,937,955,972,989,1004,1009,1023,1044,1088,1121,1126,1141,1199,1256,1262],"code",{"__ignoreMap":799},[804,805,808],"span",{"class":806,"line":807},"line",1,[804,809,811],{"class":810},"sMK4o","{\n",[804,813,815,818,822,825,828],{"class":806,"line":814},2,[804,816,817],{"class":810},"  \"",[804,819,821],{"class":820},"spNyl","route",[804,823,824],{"class":810},"\"",[804,826,827],{"class":810},":",[804,829,830],{"class":810}," {\n",[804,832,834,837,840,842,844,847,851,853],{"class":806,"line":833},3,[804,835,836],{"class":810},"    \"",[804,838,749],{"class":839},"sBMFI",[804,841,824],{"class":810},[804,843,827],{"class":810},[804,845,846],{"class":810}," \"",[804,848,850],{"class":849},"sfazB","openai",[804,852,824],{"class":810},[804,854,855],{"class":810},",\n",[804,857,859,861,863,865,867,869,872,874],{"class":806,"line":858},4,[804,860,836],{"class":810},[804,862,752],{"class":839},[804,864,824],{"class":810},[804,866,827],{"class":810},[804,868,846],{"class":810},[804,870,871],{"class":849},"gpt-5.5",[804,873,824],{"class":810},[804,875,855],{"class":810},[804,877,879,881,884,886,888,890,893,895],{"class":806,"line":878},5,[804,880,836],{"class":810},[804,882,883],{"class":839},"alias",[804,885,824],{"class":810},[804,887,827],{"class":810},[804,889,846],{"class":810},[804,891,892],{"class":849},"prod-stable",[804,894,824],{"class":810},[804,896,855],{"class":810},[804,898,900,902,905,907,909,911,914],{"class":806,"line":899},6,[804,901,836],{"class":810},[804,903,904],{"class":839},"region",[804,906,824],{"class":810},[804,908,827],{"class":810},[804,910,846],{"class":810},[804,912,913],{"class":849},"eu",[804,915,916],{"class":810},"\"\n",[804,918,920],{"class":806,"line":919},7,[804,921,922],{"class":810},"  },\n",[804,924,926,928,931,933,935],{"class":806,"line":925},8,[804,927,817],{"class":810},[804,929,930],{"class":820},"budgets",[804,932,824],{"class":810},[804,934,827],{"class":810},[804,936,830],{"class":810},[804,938,940,942,945,947,949,953],{"class":806,"line":939},9,[804,941,836],{"class":810},[804,943,944],{"class":839},"max_input_tokens",[804,946,824],{"class":810},[804,948,827],{"class":810},[804,950,952],{"class":951},"sbssI"," 12000",[804,954,855],{"class":810},[804,956,958,960,963,965,967,970],{"class":806,"line":957},10,[804,959,836],{"class":810},[804,961,962],{"class":839},"max_output_tokens",[804,964,824],{"class":810},[804,966,827],{"class":810},[804,968,969],{"class":951}," 900",[804,971,855],{"class":810},[804,973,975,977,980,982,984,987],{"class":806,"line":974},11,[804,976,836],{"class":810},[804,978,979],{"class":839},"max_tool_calls",[804,981,824],{"class":810},[804,983,827],{"class":810},[804,985,986],{"class":951}," 5",[804,988,855],{"class":810},[804,990,992,994,997,999,1001],{"class":806,"line":991},12,[804,993,836],{"class":810},[804,995,996],{"class":839},"deadline_ms",[804,998,824],{"class":810},[804,1000,827],{"class":810},[804,1002,1003],{"class":951}," 9000\n",[804,1005,1007],{"class":806,"line":1006},13,[804,1008,922],{"class":810},[804,1010,1012,1014,1017,1019,1021],{"class":806,"line":1011},14,[804,1013,817],{"class":810},[804,1015,1016],{"class":820},"policy",[804,1018,824],{"class":810},[804,1020,827],{"class":810},[804,1022,830],{"class":810},[804,1024,1026,1028,1031,1033,1035,1037,1040,1042],{"class":806,"line":1025},15,[804,1027,836],{"class":810},[804,1029,1030],{"class":839},"risk_tier",[804,1032,824],{"class":810},[804,1034,827],{"class":810},[804,1036,846],{"class":810},[804,1038,1039],{"class":849},"medium",[804,1041,824],{"class":810},[804,1043,855],{"class":810},[804,1045,1047,1049,1052,1054,1056,1059,1061,1064,1066,1069,1071,1074,1076,1078,1080,1083,1085],{"class":806,"line":1046},16,[804,1048,836],{"class":810},[804,1050,1051],{"class":839},"tools_allowed",[804,1053,824],{"class":810},[804,1055,827],{"class":810},[804,1057,1058],{"class":810}," [",[804,1060,824],{"class":810},[804,1062,1063],{"class":849},"search",[804,1065,824],{"class":810},[804,1067,1068],{"class":810},",",[804,1070,846],{"class":810},[804,1072,1073],{"class":849},"kb_retrieve",[804,1075,824],{"class":810},[804,1077,1068],{"class":810},[804,1079,846],{"class":810},[804,1081,1082],{"class":849},"github_read",[804,1084,824],{"class":810},[804,1086,1087],{"class":810},"],\n",[804,1089,1091,1093,1096,1098,1100,1102,1104,1107,1109,1111,1113,1116,1118],{"class":806,"line":1090},17,[804,1092,836],{"class":810},[804,1094,1095],{"class":839},"requires_approval_for",[804,1097,824],{"class":810},[804,1099,827],{"class":810},[804,1101,1058],{"class":810},[804,1103,824],{"class":810},[804,1105,1106],{"class":849},"email_send",[804,1108,824],{"class":810},[804,1110,1068],{"class":810},[804,1112,846],{"class":810},[804,1114,1115],{"class":849},"deploy",[804,1117,824],{"class":810},[804,1119,1120],{"class":810},"]\n",[804,1122,1124],{"class":806,"line":1123},18,[804,1125,922],{"class":810},[804,1127,1129,1131,1134,1136,1138],{"class":806,"line":1128},19,[804,1130,817],{"class":810},[804,1132,1133],{"class":820},"fallbacks",[804,1135,824],{"class":810},[804,1137,827],{"class":810},[804,1139,1140],{"class":810}," [\n",[804,1142,1144,1147,1149,1151,1153,1155,1157,1159,1161,1163,1165,1167,1169,1171,1173,1176,1178,1180,1182,1185,1187,1189,1191,1194,1196],{"class":806,"line":1143},20,[804,1145,1146],{"class":810},"    {",[804,1148,824],{"class":810},[804,1150,749],{"class":839},[804,1152,824],{"class":810},[804,1154,827],{"class":810},[804,1156,846],{"class":810},[804,1158,850],{"class":849},[804,1160,824],{"class":810},[804,1162,1068],{"class":810},[804,1164,846],{"class":810},[804,1166,752],{"class":839},[804,1168,824],{"class":810},[804,1170,827],{"class":810},[804,1172,846],{"class":810},[804,1174,1175],{"class":849},"gpt-5.4",[804,1177,824],{"class":810},[804,1179,1068],{"class":810},[804,1181,846],{"class":810},[804,1183,1184],{"class":839},"when",[804,1186,824],{"class":810},[804,1188,827],{"class":810},[804,1190,846],{"class":810},[804,1192,1193],{"class":849},"timeout_or_5xx",[804,1195,824],{"class":810},[804,1197,1198],{"class":810},"},\n",[804,1200,1202,1204,1206,1208,1210,1212,1214,1217,1219,1221,1223,1225,1227,1229,1231,1234,1236,1238,1240,1242,1244,1246,1248,1251,1253],{"class":806,"line":1201},21,[804,1203,1146],{"class":810},[804,1205,824],{"class":810},[804,1207,749],{"class":839},[804,1209,824],{"class":810},[804,1211,827],{"class":810},[804,1213,846],{"class":810},[804,1215,1216],{"class":849},"anthropic",[804,1218,824],{"class":810},[804,1220,1068],{"class":810},[804,1222,846],{"class":810},[804,1224,752],{"class":839},[804,1226,824],{"class":810},[804,1228,827],{"class":810},[804,1230,846],{"class":810},[804,1232,1233],{"class":849},"claude-sonnet",[804,1235,824],{"class":810},[804,1237,1068],{"class":810},[804,1239,846],{"class":810},[804,1241,1184],{"class":839},[804,1243,824],{"class":810},[804,1245,827],{"class":810},[804,1247,846],{"class":810},[804,1249,1250],{"class":849},"rate_limit",[804,1252,824],{"class":810},[804,1254,1255],{"class":810},"}\n",[804,1257,1259],{"class":806,"line":1258},22,[804,1260,1261],{"class":810},"  ]\n",[804,1263,1265],{"class":806,"line":1264},23,[804,1266,1255],{"class":810},[516,1268,1269],{},"That’s what you operate.",[649,1271],{},[652,1273,1275],{"id":1274},"router-patterns-that-actually-work","Router patterns that actually work",[516,1277,1278],{},"Most teams try “learned routing” immediately.",[516,1280,1281],{},"In practice, the first 80% of value comes from boring patterns.",[1283,1284,1285,1313,1340,1364],"accordion",{},[1286,1287,1290,1293,1310],"accordion-item",{"icon":1288,"label":1289},"i-lucide-list-checks","Pattern 1: Policy-first rules (fastest to ship)",[516,1291,1292],{},"Use explicit rules for:",[574,1294,1295,1298,1301,1304,1307],{},[577,1296,1297],{},"compliance (data residency, vendor allowlists)",[577,1299,1300],{},"user tier (free vs paid)",[577,1302,1303],{},"latency class (interactive vs batch)",[577,1305,1306],{},"tool family (code, search, internal ops)",[577,1308,1309],{},"prompt size bands",[516,1311,1312],{},"You can add a learned router later, but rules are debuggable and audit-friendly.",[1286,1314,1317,1320,1323,1337],{"icon":1315,"label":1316},"i-lucide-layers","Pattern 2: Cascades (cheap first, expensive last)",[516,1318,1319],{},"Start with a cheaper model and escalate only when needed.",[516,1321,1322],{},"Example triggers:",[574,1324,1325,1328,1331,1334],{},[577,1326,1327],{},"low confidence / low score",[577,1329,1330],{},"schema validation fails",[577,1332,1333],{},"tool plan is incomplete",[577,1335,1336],{},"user asks for “more depth”",[516,1338,1339],{},"Cascades reduce cost without sacrificing peak quality.",[1286,1341,1344,1347,1361],{"icon":1342,"label":1343},"i-lucide-route","Pattern 3: Specialist lanes (by artifact)",[516,1345,1346],{},"Route by artifact type:",[574,1348,1349,1352,1355,1358],{},[577,1350,1351],{},"“code patch” lane (tool use + repo read-only)",[577,1353,1354],{},"“policy answer” lane (retrieval + strict citations)",[577,1356,1357],{},"“creative” lane (lower factual constraints)",[577,1359,1360],{},"“support triage” lane (short outputs, strict structure)",[516,1362,1363],{},"This reduces cross-contamination of behaviors and makes evals clearer.",[1286,1365,1368,1375,1389],{"icon":1366,"label":1367},"i-lucide-brain","Pattern 4: Learned routing (after you have telemetry)",[516,1369,1370,1371,1374],{},"A learned router is valuable ",[548,1372,1373],{},"once"," you have:",[574,1376,1377,1380,1383,1386],{},[577,1378,1379],{},"outcome labels (success/fail)",[577,1381,1382],{},"cost labels (tokens, tool calls)",[577,1384,1385],{},"latency labels (p50/p95)",[577,1387,1388],{},"safety labels (refusal, policy violations)",[516,1390,1391],{},"Without telemetry, learned routing becomes learned guessing.",[649,1393],{},[652,1395,1397],{"id":1396},"the-missing-piece-eval-gates-routing-without-evals-is-gambling","The missing piece: eval gates (routing without evals is gambling)",[516,1399,1400],{},"Routing sounds like flexibility.",[516,1402,1403],{},"But flexibility without gates is how you ship regressions quickly.",[516,1405,1406,1407,827],{},"A model router needs ",[523,1408,1409],{},"two evaluation loops",[660,1411,1412,1418],{},[577,1413,1414,1417],{},[523,1415,1416],{},"Offline evals"," (before rollout)",[577,1419,1420,1423],{},[523,1421,1422],{},"Online guardrails"," (during rollout)",[609,1425,1426,1432],{},[612,1427,1429],{"icon":1428,"title":1416},"i-lucide-flask-conical",[516,1430,1431],{},"A curated suite that matches your tasks and failure modes (not generic benchmarks).",[612,1433,1435],{"icon":1434,"title":1422},"i-lucide-activity",[516,1436,1437],{},"Canaries, shadow traffic, and rollback triggers based on real metrics.",[789,1439,1441],{"id":1440},"what-to-measure-practical-not-academic","What to measure (practical, not academic)",[516,1443,1444],{},"Offline:",[574,1446,1447,1450,1453,1456,1459],{},[577,1448,1449],{},"task success rate (by task family)",[577,1451,1452],{},"schema validity rate (structured outputs)",[577,1454,1455],{},"citation validity rate (if you cite)",[577,1457,1458],{},"tool-use correctness (did it call the right tools, safely?)",[577,1460,1461],{},"refusal/over-refusal rate (policy behavior)",[516,1463,1464],{},"Online:",[574,1466,1467,1470,1473,1476,1479],{},[577,1468,1469],{},"p95 latency per route",[577,1471,1472],{},"cost per request (tokens + tool calls + retries)",[577,1474,1475],{},"fallback rate (how often did we fail over?)",[577,1477,1478],{},"incident signals (timeouts, 5xx, rate limits)",[577,1480,1481],{},"“human override” rate (edits, re-asks, escalations)",[1483,1484,1485,1488,1502],"warning",{},[516,1486,1487],{},"If you upgrade models without eval gates, you’ll be surprised by:",[574,1489,1490,1493,1496,1499],{},[577,1491,1492],{},"new failure modes",[577,1494,1495],{},"new refusal behaviors",[577,1497,1498],{},"new tool-call patterns",[577,1500,1501],{},"new cost profiles",[516,1503,1504],{},"Model upgrades are dependency upgrades — with more personality.",[649,1506],{},[652,1508,1510],{"id":1509},"budgets-are-the-routers-real-language","Budgets are the router’s real language",[516,1512,1513],{},"People talk about routing like it’s “quality optimization.”",[516,1515,1516],{},"In production, routing is budget management:",[574,1518,1519,1525,1531,1537],{},[577,1520,1521,1524],{},[523,1522,1523],{},"latency budget"," (interactive vs batch)",[577,1526,1527,1530],{},[523,1528,1529],{},"token budget"," (input + output caps)",[577,1532,1533,1536],{},[523,1534,1535],{},"tool budget"," (number and type of side effects)",[577,1538,1539,1542],{},[523,1540,1541],{},"risk budget"," (what actions are allowed without humans)",[789,1544,1546],{"id":1545},"the-budget-stack-the-thing-you-enforce","The budget stack (the thing you enforce)",[1548,1549,1551,1555,1566,1570,1573,1577,1580,1584,1587],"steps",{"level":1550},"3",[789,1552,1554],{"id":1553},"step-1-define-latency-classes","Step 1 — Define latency classes",[574,1556,1557,1560,1563],{},[577,1558,1559],{},"interactive: tight deadline, limited tool calls",[577,1561,1562],{},"background: slower, cheaper, more retries allowed",[577,1564,1565],{},"batch: large payloads, discounted tiers, strict caps",[789,1567,1569],{"id":1568},"step-2-cap-output-like-you-mean-it","Step 2 — Cap output like you mean it",[516,1571,1572],{},"Unbounded output is an invoice generator.\nCap output per route and use “continue” only when necessary.",[789,1574,1576],{"id":1575},"step-3-put-a-ceiling-on-tool-calls","Step 3 — Put a ceiling on tool calls",[516,1578,1579],{},"Tool use is side effects + cost.\nLimit calls and require explicit escalation for risky tools.",[789,1581,1583],{"id":1582},"step-4-make-budgets-visible-in-logs","Step 4 — Make budgets visible in logs",[516,1585,1586],{},"Every response should be traceable to:",[574,1588,1589,1592,1595],{},[577,1590,1591],{},"route chosen",[577,1593,1594],{},"budgets enforced",[577,1596,1597],{},"fallbacks taken",[516,1599,1600],{},"Budgets don’t reduce quality.\nThey make quality reproducible.",[649,1602],{},[652,1604,1606],{"id":1605},"security-routing-treat-the-prompt-as-sensitive-input","Security routing: treat the prompt as sensitive input",[516,1608,1609],{},"By April 2026, “model routing” is also “security routing.”",[516,1611,1612],{},"Because not all requests are equal:",[574,1614,1615,1618,1621,1624],{},[577,1616,1617],{},"some contain internal IP",[577,1619,1620],{},"some contain regulated data",[577,1622,1623],{},"some trigger privileged tools",[577,1625,1626],{},"some come from untrusted sources (web/email/doc uploads)",[516,1628,1629],{},"So you need policy-driven routing based on data class and action class.",[609,1631,1632,1639],{},[612,1633,1636],{"icon":1634,"title":1635},"i-lucide-tag","Data classification",[516,1637,1638],{},"Route based on what’s in the prompt: public, internal, confidential, regulated.",[612,1640,1643],{"icon":1641,"title":1642},"i-lucide-lock","Tool scopes",[516,1644,1645],{},"Route based on actions: read-only vs write, human approval required, sandbox-only.",[789,1647,1649],{"id":1648},"the-simplest-safe-policy","The simplest safe policy",[574,1651,1652,1658,1664,1670],{},[577,1653,1654,1657],{},[523,1655,1656],{},"Confidential data"," → restricted vendor allowlist + strict logging + no external tools",[577,1659,1660,1663],{},[523,1661,1662],{},"Untrusted docs"," → retrieval sandbox + tool gateway + injection-resistant prompt layout",[577,1665,1666,1669],{},[523,1667,1668],{},"Write actions"," → require explicit approval state or step-up confirmation",[577,1671,1672,1675],{},[523,1673,1674],{},"Payments / deploy / delete"," → human gate, always",[1677,1678,1679,1682],"tip",{},[516,1680,1681],{},"A router is where governance becomes enforceable.",[516,1683,1684],{},"If your policy doesn’t exist in the router, it doesn’t exist in production.",[649,1686],{},[652,1688,1690],{"id":1689},"the-implementation-blueprint-router-as-control-plane","The implementation blueprint (router as control plane)",[516,1692,1693],{},"Here’s the architecture spine that scales without turning into chaos.",[1695,1696],"img",{"alt":1697,"src":1698},"Model router control plane: request classifier, policy engine, router, provider adapters, caching, telemetry, eval gates, rollback","blog/2026/illustrations/model-router-control-plane.avif",[516,1700,1701],{},[523,1702,1703],{},"Key components:",[574,1705,1706,1716,1726,1736,1746,1756],{},[577,1707,1708,1711],{},[523,1709,1710],{},"Request classifier",[574,1712,1713],{},[577,1714,1715],{},"extracts features: prompt size, tool needs, risk tier, user tier, latency class",[577,1717,1718,1721],{},[523,1719,1720],{},"Policy engine",[574,1722,1723],{},[577,1724,1725],{},"allowlists, residency constraints, tool scopes, budgets",[577,1727,1728,1731],{},[523,1729,1730],{},"Router",[574,1732,1733],{},[577,1734,1735],{},"selects route + fallbacks, emits decision object",[577,1737,1738,1741],{},[523,1739,1740],{},"Provider adapters",[574,1742,1743],{},[577,1744,1745],{},"normalize APIs across vendors, handle retries/timeouts, enforce caps",[577,1747,1748,1751],{},[523,1749,1750],{},"Telemetry",[574,1752,1753],{},[577,1754,1755],{},"logs route decisions, costs, latencies, fallbacks, validation outcomes",[577,1757,1758,1761],{},[523,1759,1760],{},"Rollback manager",[574,1762,1763],{},[577,1764,1765],{},"flips aliases (“prod-stable”) back to prior versions instantly",[516,1767,1768],{},"This is how you make “many models” feel like one operable system.",[649,1770],{},[652,1772,1774],{"id":1773},"failure-modes-what-breaks-first","Failure modes (what breaks first)",[516,1776,1777],{},"Routing introduces new classes of bugs.",[516,1779,1780],{},"Here are the ones I’d expect to see in the first month.",[1283,1782,1783,1797,1809,1821,1833],{},[1286,1784,1787],{"icon":1785,"label":1786},"i-lucide-shuffle","Failure: inconsistent behavior across routes",[516,1788,1789,1792,1793,1796],{},[523,1790,1791],{},"Symptom:"," identical requests get different answers depending on route chosen.\n",[523,1794,1795],{},"Fix:"," define task lanes, reduce randomness, pin routes for certain workflows, and add consistency evals.",[1286,1798,1801],{"icon":1799,"label":1800},"i-lucide-wallet","Failure: hidden cost explosions",[516,1802,1803,1805,1806,1808],{},[523,1804,1791],{}," token usage creeps up as models change verbosity or tool behavior.\n",[523,1807,1795],{}," enforce output caps, measure token burn by route, and add cost regression tests.",[1286,1810,1813],{"icon":1811,"label":1812},"i-lucide-plug","Failure: tool-use regressions",[516,1814,1815,1817,1818,1820],{},[523,1816,1791],{}," a new model calls tools more often or in unsafe sequences.\n",[523,1819,1795],{}," tool gateway + tool-use eval suite + per-route tool budgets.",[1286,1822,1825],{"icon":1823,"label":1824},"i-lucide-cloud-rain","Failure: fallback storms",[516,1826,1827,1829,1830,1832],{},[523,1828,1791],{}," provider degradation triggers widespread fallbacks and multiplies load.\n",[523,1831,1795],{}," budgets + circuit breakers + controlled degrade modes (cheaper routes, fewer tools).",[1286,1834,1837],{"icon":1835,"label":1836},"i-lucide-shield-alert","Failure: policy drift",[516,1838,1839,1841,1842,1844],{},[523,1840,1791],{}," the model starts refusing too much or complying too much after an upgrade.\n",[523,1843,1795],{}," policy behavior evals and route pinning for high-risk workflows.",[649,1846],{},[652,1848,1850],{"id":1849},"april-takeaway","April takeaway",[516,1852,1853],{},"The frontier is moving too fast for static decisions.",[516,1855,1856],{},"So the stable posture for 2026 is:",[612,1858,1860,1863,1869],{"icon":1859,"title":1850},"i-lucide-anchor",[516,1861,1862],{},"Stop selecting “a model.”",[516,1864,1865,1866,827],{},"Build and operate a ",[523,1867,1868],{},"model router",[574,1870,1871,1874,1877,1880],{},[577,1872,1873],{},"routes are policy decisions",[577,1875,1876],{},"evals are release gates",[577,1878,1879],{},"budgets are the contract",[577,1881,1882],{},"rollbacks are mandatory",[649,1884],{},[652,1886,1888],{"id":1887},"resources","Resources",[609,1890,1891,1900,1907,1915,1922,1930],{},[612,1892,1897],{"icon":1893,"title":1894,"target":1895,"to":1896},"i-lucide-newspaper","GPT-5.5 release coverage (The Verge)","_blank","https://www.theverge.com/ai-artificial-intelligence/917612/openai-gpt-5-5-chatgpt",[516,1898,1899],{},"A clear summary of why newer frontier releases increasingly emphasize tool use, efficiency, and “real work” positioning.",[612,1901,1904],{"icon":1893,"title":1902,"target":1895,"to":1903},"DeepSeek V4 on Huawei Ascend (Reuters)","https://www.reuters.com/world/china/deepseek-v4-chinese-ai-model-adapted-huawei-chips-2026-04-24/",[516,1905,1906],{},"A reminder that model availability, hardware optimization, and geopolitical supply chains shape what “routing” can even mean.",[612,1908,1912],{"icon":1909,"title":1910,"target":1895,"to":1911},"i-simple-icons-github","Copilot auto model selection (GitHub Changelog)","https://github.blog/changelog/2026-04-17-github-copilot-cli-now-supports-copilot-auto-model-selection/",[516,1913,1914],{},"A mainstream signal: “auto routing” is becoming a user-facing product feature, not just an infra trick.",[612,1916,1919],{"icon":1909,"title":1917,"target":1895,"to":1918},"GPT-5.5 in Copilot model picker (GitHub Changelog)","https://github.blog/changelog/2026-04-24-gpt-5-5-is-generally-available-for-github-copilot/",[516,1920,1921],{},"Routing is now explicit: products are shipping model selection as an operational control surface.",[612,1923,1927],{"icon":1924,"title":1925,"target":1895,"to":1926},"i-lucide-file-text","R2-Router (arXiv)","https://arxiv.org/html/2602.02823v1",[516,1928,1929],{},"A research example of budget-aware routing and constrained decoding as a cost/quality lever.",[612,1931,1934],{"icon":1924,"title":1932,"target":1895,"to":1933},"Universal model routing (OpenReview)","https://openreview.net/pdf?id=ka82fvJ5f1",[516,1935,1936],{},"A useful framing: routers must handle a changing pool of models, not a fixed set.",[649,1938],{},[652,1940,1942],{"id":1941},"faq","FAQ",[1283,1944,1945,1972,1995,2004],{},[1286,1946,1949,1952,1955,1969],{"icon":1947,"label":1948},"i-lucide-circle-help","Do I really need routing if my team uses one vendor?",[516,1950,1951],{},"Often, yes.",[516,1953,1954],{},"Even within one vendor you still have:",[574,1956,1957,1960,1963,1966],{},[577,1958,1959],{},"multiple model tiers",[577,1961,1962],{},"different latency/cost profiles",[577,1964,1965],{},"tool-use differences",[577,1967,1968],{},"version rollouts",[516,1970,1971],{},"Routing is how you make those differences explicit and operable.",[1286,1973,1975,1978,1992],{"icon":1947,"label":1974},"What’s the smallest viable router?",[516,1976,1977],{},"A rules-based classifier + policy engine that selects between:",[574,1979,1980,1983,1986,1989],{},[577,1981,1982],{},"a cheap default model",[577,1984,1985],{},"a “hard problems” model",[577,1987,1988],{},"a long-context model",[577,1990,1991],{},"and a safe fallback",[516,1993,1994],{},"With logs, budgets, and a rollback alias.",[1286,1996,1998,2001],{"icon":1947,"label":1997},"What’s the biggest mistake teams make?",[516,1999,2000],{},"Routing without eval gates.",[516,2002,2003],{},"They treat routing as “flexibility” and forget it’s also “risk surface.”\nIf you can’t measure per-route outcomes, you can’t trust the router.",[1286,2005,2007,2010],{"icon":1947,"label":2006},"Is learned routing worth it?",[516,2008,2009],{},"Yes — after telemetry exists.",[516,2011,2012],{},"Learned routers can optimize cost/quality well, but only if you have labels and stable measurements.\nStart with rules, then learn.",[2014,2015,2016],"style",{},"html pre.shiki code .sMK4o, html code.shiki .sMK4o{--shiki-light:#39ADB5;--shiki-default:#89DDFF;--shiki-dark:#89DDFF}html pre.shiki code .spNyl, html code.shiki .spNyl{--shiki-light:#9C3EDA;--shiki-default:#C792EA;--shiki-dark:#C792EA}html pre.shiki code .sBMFI, html code.shiki .sBMFI{--shiki-light:#E2931D;--shiki-default:#FFCB6B;--shiki-dark:#FFCB6B}html pre.shiki code .sfazB, html code.shiki .sfazB{--shiki-light:#91B859;--shiki-default:#C3E88D;--shiki-dark:#C3E88D}html pre.shiki code .sbssI, html code.shiki .sbssI{--shiki-light:#F76D47;--shiki-default:#F78C6C;--shiki-dark:#F78C6C}html .light .shiki span {color: var(--shiki-light);background: var(--shiki-light-bg);font-style: var(--shiki-light-font-style);font-weight: var(--shiki-light-font-weight);text-decoration: var(--shiki-light-text-decoration);}html.light .shiki span {color: var(--shiki-light);background: var(--shiki-light-bg);font-style: var(--shiki-light-font-style);font-weight: var(--shiki-light-font-weight);text-decoration: var(--shiki-light-text-decoration);}html .default .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .shiki span {color: var(--shiki-default);background: var(--shiki-default-bg);font-style: var(--shiki-default-font-style);font-weight: var(--shiki-default-font-weight);text-decoration: var(--shiki-default-text-decoration);}html .dark .shiki span {color: var(--shiki-dark);background: var(--shiki-dark-bg);font-style: var(--shiki-dark-font-style);font-weight: var(--shiki-dark-font-weight);text-decoration: var(--shiki-dark-text-decoration);}html.dark .shiki span {color: var(--shiki-dark);background: var(--shiki-dark-bg);font-style: var(--shiki-dark-font-style);font-weight: var(--shiki-dark-font-weight);text-decoration: var(--shiki-dark-text-decoration);}",{"title":799,"searchDepth":814,"depth":814,"links":2018},[2019,2020,2023,2024,2027,2034,2037,2038,2039,2040,2041],{"id":654,"depth":814,"text":655},{"id":731,"depth":814,"text":732,"children":2021},[2022],{"id":791,"depth":833,"text":792},{"id":1274,"depth":814,"text":1275},{"id":1396,"depth":814,"text":1397,"children":2025},[2026],{"id":1440,"depth":833,"text":1441},{"id":1509,"depth":814,"text":1510,"children":2028},[2029,2030,2031,2032,2033],{"id":1545,"depth":833,"text":1546},{"id":1553,"depth":833,"text":1554},{"id":1568,"depth":833,"text":1569},{"id":1575,"depth":833,"text":1576},{"id":1582,"depth":833,"text":1583},{"id":1605,"depth":814,"text":1606,"children":2035},[2036],{"id":1648,"depth":833,"text":1649},{"id":1689,"depth":814,"text":1690},{"id":1773,"depth":814,"text":1774},{"id":1849,"depth":814,"text":1850},{"id":1887,"depth":814,"text":1888},{"id":1941,"depth":814,"text":1942},"2026-04-26T00:00:00.000Z","Weekly frontier releases turned “pick a model” into an operational anti-pattern. In April 2026, routing became the real product: choose models per request, enforce eval gates, and budget cost/latency/security like an SRE system.","md","blog/2026/the-model-router-era-routing-eval-gates-and-budgets.avif",{"slug":2047},"the-model-router-era-routing-eval-gates-and-budgets",true,{"title":458,"description":2043},{"loc":459,"images":2051},[2052],{"loc":1698},"_ReWJ8rNPdZgdytoRgu-VO6rhbiru23wLJXVeSbr8oo",[2055,2056],null,{"title":26,"path":27,"stem":28,"description":2057,"children":-1},"Once agents can take actions, they become non-human identities with privileges — and that turns “AI productivity” into “security surface.” This post explains the new threat model, why “kill switches” are really identity and execution controls, and the architecture patterns that make agentic systems operable.",1777404338619]