Model Router
Plain Explanation
A model router decides which AI model should handle a request. A simple classification task might go to a small model, while a hard reasoning or high-risk request goes to a stronger model. The goal is to reduce cost and latency without losing too much quality.
Examples & Analogies
It is like a support center that sends easy questions to automation and complex cases to a human specialist. A short document-classification request can go to an SLM, a complex debugging task to a frontier LLM, and an image-plus-text request to a multimodal model.
At a Glance
| Dimension | Single model | Model router |
|---|---|---|
| Method | Same model for all requests | Select model per request |
| Strength | Simple operation | Cost and latency optimization |
| Risk | Easy requests become expensive | Wrong routing can hurt quality |
| Required pieces | One model and API | Classifier, policy, fallback, logs |
Where and Why It Matters
When an AI product uses multiple models, model choice becomes a product and infrastructure decision. A router can send easy traffic to cheaper models and escalate difficult traffic to stronger models. In agentic systems, it may also need to account for tool-use capability.
Common Misconceptions
- Myth: A router just picks the cheapest model.
- Reality: It should pick the cheapest model that still satisfies quality and safety requirements.
- Myth: Prompt length is enough to estimate difficulty.
- Reality: Task type, tool needs, risk, modality, and expected output matter.
- Myth: Routing mistakes are harmless.
- Reality: Bad routing can reduce quality, increase cost, or create safety failures.
How It Sounds in Conversation
- "Send routine requests to the small model and fallback failures to the frontier model."
- "Router accuracy is less important than end-to-end correctness and cost/request."
- "Tool-calling requests need a different routing policy from normal chat."
Related Reading
References
- Switchcraft: AI Model Router for Agentic Tool Calling
A direct model-routing paper for agentic tool calling with cost/correctness tradeoffs.
- Switchcraft: AI Model Router for Agentic Tool Calling
Covers how model choice affects cost and correctness for tool-use tasks.
- Dynamic Model Routing and Cascading for Efficient LLM Inference: A Survey
Surveys dynamic routing and cascading across independently trained LLMs.