Model Routing
Model routing lets you automatically select the best LLM for each request based on complexity, token count, and keywords. Route simple queries to fast, cheap models and complex reasoning tasks to more capable ones.
How It Works
When your agent sends a request to POST /api/v1/route-model, 2Signal analyzes the input and returns a model recommendation based on your routing rules. The complexity analyzer scores the input on a 0–1 scale using four factors:
| Factor | Weight | What It Measures |
|---|---|---|
| Length | 25% | Estimated token count normalized to 2,000 tokens |
| Vocabulary richness | 20% | Ratio of unique words to total words |
| Question complexity | 35% | Reasoning indicators, multi-part questions, comparative analysis |
| Structural complexity | 20% | Code blocks, JSON/XML, tables, nested lists, URLs |
Creating a Routing Config
Set up routing in the dashboard under Project → Model Routing → Create Config, or via the tRPC API.
A routing config consists of:
- Name — unique identifier for this config within the project
- Default model — fallback when no rule matches
- Rules — ordered list of conditions and target models
Example Config
{
"name": "production-router",
"defaultModel": "gpt-4o-mini",
"rules": [
{
"name": "complex-reasoning",
"condition": {
"type": "complexity",
"threshold": 0.7
},
"model": "gpt-4o",
"priority": 1
},
{
"name": "long-context",
"condition": {
"type": "token_count",
"maxTokens": 4000
},
"model": "gpt-4o",
"priority": 2
},
{
"name": "code-tasks",
"condition": {
"type": "keyword",
"keywords": ["code", "debug", "refactor", "implement"]
},
"model": "claude-sonnet-4-6",
"priority": 3
},
{
"name": "fallback",
"condition": { "type": "always" },
"model": "gpt-4o-mini",
"priority": 10
}
]
}Rule Conditions
| Type | Fields | Description |
|---|---|---|
complexity | threshold (0–1) | Matches when the input complexity score exceeds the threshold |
token_count | maxTokens (int) | Matches when estimated token count exceeds the limit |
keyword | keywords (string[]) | Matches when any keyword appears in the input (case-insensitive) |
always | — | Always matches. Use as a catch-all at the lowest priority |
Rules are evaluated in priority order (lowest number first). The first matching rule determines the model.
Using the Routing API
curl -X POST https://api.2signal.dev/api/v1/route-model \
-H "Authorization: Bearer 2s_live_your_key" \
-H "Content-Type: application/json" \
-d '{"input": "Compare the pros and cons of microservices vs monoliths for a team of 5", "configName": "production-router"}'Response
{
"data": {
"model": "gpt-4o",
"configName": "production-router",
"complexity": {
"score": 0.78,
"tokenEstimate": 156,
"factors": {
"length": 0.08,
"vocabularyRichness": 0.82,
"questionComplexity": 0.95,
"structuralComplexity": 0.1
}
}
},
"error": null
}Integrating with Your Agent
import httpx
from openai import OpenAI
TWOSIGNAL_API = "https://api.2signal.dev"
API_KEY = "2s_live_your_key"
def route_and_call(user_input: str) -> str:
# Step 1: Ask 2Signal which model to use
resp = httpx.post(
f"{TWOSIGNAL_API}/api/v1/route-model",
headers={"Authorization": f"Bearer {API_KEY}"},
json={"input": user_input, "configName": "production-router"},
)
model = resp.json()["data"]["model"]
# Step 2: Call the recommended model
client = OpenAI()
completion = client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": user_input}],
)
return completion.choices[0].message.contentBest Practices
- Start with two tiers — a cheap model for simple queries and a capable model for complex ones. Add more tiers as you see patterns in your traffic.
- Use the Cost evaluator alongside routing — track whether routing actually reduces your spend.
- Set an
alwaysfallback — ensures every request gets routed, even if no condition matches. - Monitor complexity distributions — if most traffic scores above your threshold, your threshold may be too low.
- Test with datasets — run your dataset through the routing API to see how items would be classified before deploying.
Complexity Scoring Details
Understanding what drives complexity scores helps you set better thresholds:
- Low complexity (0.0–0.3) — Short, simple questions. "What is the capital of France?"
- Medium complexity (0.3–0.6) — Multi-sentence questions, some domain terms. "Explain how TCP handles retransmission."
- High complexity (0.6–1.0) — Comparative analysis, code with context, multi-part reasoning. "Compare React and Vue for a large enterprise app, considering bundle size, hiring, and migration cost."