Guide

Model Routing

Model routing lets you automatically select the best LLM for each request based on complexity, token count, and keywords. Route simple queries to fast, cheap models and complex reasoning tasks to more capable ones.

How It Works

When your agent sends a request to POST /api/v1/route-model, 2Signal analyzes the input and returns a model recommendation based on your routing rules. The complexity analyzer scores the input on a 0–1 scale using four factors:

Factor	Weight	What It Measures
Length	25%	Estimated token count normalized to 2,000 tokens
Vocabulary richness	20%	Ratio of unique words to total words
Question complexity	35%	Reasoning indicators, multi-part questions, comparative analysis
Structural complexity	20%	Code blocks, JSON/XML, tables, nested lists, URLs

Creating a Routing Config

Set up routing in the dashboard under Project → Model Routing → Create Config, or via the tRPC API.

A routing config consists of:

Name — unique identifier for this config within the project
Default model — fallback when no rule matches
Rules — ordered list of conditions and target models

Example Config

{
  "name": "production-router",
  "defaultModel": "gpt-4o-mini",
  "rules": [
    {
      "name": "complex-reasoning",
      "condition": {
        "type": "complexity",
        "threshold": 0.7
      },
      "model": "gpt-4o",
      "priority": 1
    },
    {
      "name": "long-context",
      "condition": {
        "type": "token_count",
        "maxTokens": 4000
      },
      "model": "gpt-4o",
      "priority": 2
    },
    {
      "name": "code-tasks",
      "condition": {
        "type": "keyword",
        "keywords": ["code", "debug", "refactor", "implement"]
      },
      "model": "claude-sonnet-4-6",
      "priority": 3
    },
    {
      "name": "fallback",
      "condition": { "type": "always" },
      "model": "gpt-4o-mini",
      "priority": 10
    }
  ]
}

Rule Conditions

Type	Fields	Description
`complexity`	`threshold` (0–1)	Matches when the input complexity score exceeds the threshold
`token_count`	`maxTokens` (int)	Matches when estimated token count exceeds the limit
`keyword`	`keywords` (string[])	Matches when any keyword appears in the input (case-insensitive)
`always`	—	Always matches. Use as a catch-all at the lowest priority

Rules are evaluated in priority order (lowest number first). The first matching rule determines the model.

Using the Routing API

curl -X POST https://api.2signal.dev/api/v1/route-model \
  -H "Authorization: Bearer 2s_live_your_key" \
  -H "Content-Type: application/json" \
  -d '{"input": "Compare the pros and cons of microservices vs monoliths for a team of 5", "configName": "production-router"}'

Response

{
  "data": {
    "model": "gpt-4o",
    "configName": "production-router",
    "complexity": {
      "score": 0.78,
      "tokenEstimate": 156,
      "factors": {
        "length": 0.08,
        "vocabularyRichness": 0.82,
        "questionComplexity": 0.95,
        "structuralComplexity": 0.1
      }
    }
  },
  "error": null
}

Integrating with Your Agent

import httpx
from openai import OpenAI

TWOSIGNAL_API = "https://api.2signal.dev"
API_KEY = "2s_live_your_key"

def route_and_call(user_input: str) -> str:
    # Step 1: Ask 2Signal which model to use
    resp = httpx.post(
        f"{TWOSIGNAL_API}/api/v1/route-model",
        headers={"Authorization": f"Bearer {API_KEY}"},
        json={"input": user_input, "configName": "production-router"},
    )
    model = resp.json()["data"]["model"]

    # Step 2: Call the recommended model
    client = OpenAI()
    completion = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": user_input}],
    )
    return completion.choices[0].message.content

Best Practices

Start with two tiers — a cheap model for simple queries and a capable model for complex ones. Add more tiers as you see patterns in your traffic.
Use the Cost evaluator alongside routing — track whether routing actually reduces your spend.
Set an always fallback — ensures every request gets routed, even if no condition matches.
Monitor complexity distributions — if most traffic scores above your threshold, your threshold may be too low.
Test with datasets — run your dataset through the routing API to see how items would be classified before deploying.

Complexity Scoring Details

Understanding what drives complexity scores helps you set better thresholds:

Low complexity (0.0–0.3) — Short, simple questions. "What is the capital of France?"
Medium complexity (0.3–0.6) — Multi-sentence questions, some domain terms. "Explain how TCP handles retransmission."
High complexity (0.6–1.0) — Comparative analysis, code with context, multi-part reasoning. "Compare React and Vue for a large enterprise app, considering bundle size, hiring, and migration cost."

Have questions? Join our community.

Connect with other developers and the 2Signal team.

Join Discord