Everything you need to ship reliable agents.
From tracing to evaluation to monitoring — one platform, no stitching tools together.
One-line tracing
Add @observe to any function. Every LLM call, tool use, retrieval, and decision inside it is captured as a span tree — no manual instrumentation.
OpenAI & Anthropic wrappers
wrap_openai() and wrap_anthropic() trace every completion automatically. Drop it in, change nothing else.
Nested span trees
Agents call tools that call other agents. 2signal captures the full call hierarchy with timing, tokens, and cost at every level.
LLM-as-Judge
Define criteria in plain English. An LLM scores every trace on a pass/fail or 1–5 scale with written reasoning.
Deterministic checks
Contains, Regex Match, and JSON Schema evaluators for hard rules — output must include X, match pattern Y, or conform to schema Z.
Performance evaluators
Latency and Cost evaluators enforce budgets automatically. Flag any trace over 2 seconds or $0.10.
Similarity scoring
TF-IDF cosine similarity compares outputs against expected answers. No LLM calls — runs locally, instantly.
Trace explorer
Filter by status, evaluator score, latency, cost, or tags. Click any trace to see the full span tree with inputs, outputs, and timing.
Regression detection
Compare agent versions side-by-side. See which evaluators degraded, which spans got slower, and where costs increased.
Usage tracking
Per-project trace counts, token usage, and cost rollups. Warnings at 80% and 100% of plan limits.
Model routing
Route queries to the right model by complexity. Simple questions get cheap models. Save 30–50% on LLM costs without losing quality.
Python SDK
pip install twosignal. Supports Python 3.9+. Background export with daemon threads — zero impact on agent latency.
REST API
Language-agnostic trace ingestion, scoring, and querying. Build custom integrations in any language.
Async by default
Traces persist to S3 first, then process through Redis queues. Evaluations run in background workers — your agent never waits.