Why 2signal

AI agents are hard to test. They're non-deterministic, they call external APIs, they chain decisions together, and they fail in ways that unit tests can't catch. 2signal gives you the tools to test, evaluate, and monitor them in production.

Built for agents, not just LLMs

Most observability tools trace individual LLM calls. 2signal traces the full agent execution — every LLM call, tool invocation, retrieval step, and decision point in a single trace with nested spans. You see the complete picture, not just the API call.

Evaluation is built in, not bolted on

2signal includes 7 evaluators out of the box — from fast deterministic checks (Contains, Regex, JSON Schema) to semantic scoring (LLM Judge, Similarity) to operational thresholds (Latency, Cost). Configure them once and every trace is automatically scored.

No separate evaluation framework to set up. No scripts to maintain. No pipeline to build. Evaluators run async in the background — they never slow down your agent.

One line to instrument

Add @observe to your function. That's it. Every call is traced with inputs, outputs, timing, and errors. Wrap your LLM client for automatic token and cost tracking.

from twosignal import TwoSignal, observe
from twosignal.wrappers import wrap_openai

ts = TwoSignal()
client = wrap_openai(OpenAI())

@observe
def my_agent(query):
    return client.chat.completions.create(...)

Testing and monitoring in one platform

Most teams cobble together separate tools for development testing, CI evaluation, and production monitoring. 2signal handles all three. Create datasets for regression testing, run evaluators in CI/CD, and monitor production traces — all using the same evaluators, the same scores, and the same dashboard.

Smart model routing

Not every query needs your most expensive model. 2signal's model routing analyzes complexity and routes simple requests to cheaper models automatically. Teams typically see 50–75% cost reduction without quality loss.

Three interfaces

Use the web dashboard for visual exploration, the CLI for scripting and CI/CD, or the TUI for real-time terminal monitoring. They all share the same data and auth.

10 integrations, more coming

First-class support for OpenAI, Anthropic, Google, Mistral, Cohere, Groq, LangChain, LlamaIndex, CrewAI, and AutoGen. Each integration is a single function call — no config files, no adapters, no middleware.