Testing

Production Monitoring

Testing doesn't stop at deployment. 2Signal evaluates production traces in real-time so you can catch quality regressions, cost spikes, and performance issues as they happen.

Always-on Evaluators

Any evaluator you enable in your project runs automatically on every new trace. No sampling config is needed for structural evaluators — they're free and add zero latency to your agent's response.

Evaluator	Type	Cost	Why
Format check	`JSON_SCHEMA` or `REGEX`	Free	Catch broken output formats instantly
Required content	`CONTAINS`	Free	Ensure key info is always present
Response time	`LATENCY`	Free	Detect latency regressions
Cost per trace	`COST`	Free	Catch cost spikes before the bill
Quality score	`LLM_JUDGE`	~$0.001/eval	Measure semantic quality

Setting Up Alerts

Configure alert rules in the dashboard to get notified when metrics breach your thresholds:

Go to your project settings → Alerts
Create a new alert rule
Configure the metric, threshold, time window, cooldown, and delivery channel

Example: Quality Drop

Alert: Quality Drop
  Metric: EVAL_SCORE_AVG
  Evaluator: helpfulness
  Threshold: < 0.75
  Window: 1 hour
  Cooldown: 4 hours
  Channel: SLACK

Example: Cost Spike

Alert: Cost Spike
  Metric: TRACE_COST_AVG
  Threshold: > 0.15
  Window: 30 minutes
  Cooldown: 2 hours
  Channel: EMAIL

Example: High Error Rate

Alert: High Error Rate
  Metric: ERROR_RATE
  Threshold: > 0.05
  Window: 15 minutes
  Cooldown: 1 hour
  Channel: WEBHOOK

Dashboard Monitoring

The overview page gives you a real-time view of your agent's health:

Trace volume over time
Error rate trend
Eval pass rate trend
Cost over time
Latency percentiles (p50, p95, p99)
Eval score trends per evaluator
Cost breakdown by model
Error rate by span type

Sampling Strategy for LLM Judge

Structural evaluators are free, but LLM_JUDGE calls an LLM for each evaluation. Use sampling to control costs at higher volumes:

Trace Volume	Recommended Sampling	Estimated Cost
< 1K/day	100%	< $1/day
1K-10K/day	20-50%	$2-5/day
10K-100K/day	5-10%	$5-10/day
> 100K/day	1-5%	$10-50/day

Incident Response Workflow

Alert fires (Slack/email/webhook)
Check the dashboard overview for the affected metric
Filter traces by time window to find the regression
Inspect individual failing traces to understand the root cause
Identify the trigger — a deploy, model change, or traffic pattern shift
Fix and verify with a dataset evaluation before redeploying

Tips

Set different alert thresholds for different severity levels
Use cooldown periods to prevent alert storms
Keep webhook alerts for PagerDuty/OpsGenie integration
Review the overview dashboard daily even without alerts — gradual drift won't trigger threshold alerts

See the Alerts & Usage guide for full alert configuration reference.

Have questions? Join our community.

Connect with other developers and the 2Signal team.

Join Discord