Testing
Production Monitoring
Testing doesn't stop at deployment. 2Signal evaluates production traces in real-time so you can catch quality regressions, cost spikes, and performance issues as they happen.
Always-on Evaluators
Any evaluator you enable in your project runs automatically on every new trace. No sampling config is needed for structural evaluators — they're free and add zero latency to your agent's response.
| Evaluator | Type | Cost | Why |
|---|---|---|---|
| Format check | JSON_SCHEMA or REGEX | Free | Catch broken output formats instantly |
| Required content | CONTAINS | Free | Ensure key info is always present |
| Response time | LATENCY | Free | Detect latency regressions |
| Cost per trace | COST | Free | Catch cost spikes before the bill |
| Quality score | LLM_JUDGE | ~$0.001/eval | Measure semantic quality |
Setting Up Alerts
Configure alert rules in the dashboard to get notified when metrics breach your thresholds:
- Go to your project settings → Alerts
- Create a new alert rule
- Configure the metric, threshold, time window, cooldown, and delivery channel
Example: Quality Drop
Alert: Quality Drop
Metric: EVAL_SCORE_AVG
Evaluator: helpfulness
Threshold: < 0.75
Window: 1 hour
Cooldown: 4 hours
Channel: SLACKExample: Cost Spike
Alert: Cost Spike
Metric: TRACE_COST_AVG
Threshold: > 0.15
Window: 30 minutes
Cooldown: 2 hours
Channel: EMAILExample: High Error Rate
Alert: High Error Rate
Metric: ERROR_RATE
Threshold: > 0.05
Window: 15 minutes
Cooldown: 1 hour
Channel: WEBHOOKDashboard Monitoring
The overview page gives you a real-time view of your agent's health:
- Trace volume over time
- Error rate trend
- Eval pass rate trend
- Cost over time
- Latency percentiles (p50, p95, p99)
- Eval score trends per evaluator
- Cost breakdown by model
- Error rate by span type
Sampling Strategy for LLM Judge
Structural evaluators are free, but LLM_JUDGE calls an LLM for each evaluation. Use sampling to control costs at higher volumes:
| Trace Volume | Recommended Sampling | Estimated Cost |
|---|---|---|
| < 1K/day | 100% | < $1/day |
| 1K-10K/day | 20-50% | $2-5/day |
| 10K-100K/day | 5-10% | $5-10/day |
| > 100K/day | 1-5% | $10-50/day |
Incident Response Workflow
- Alert fires (Slack/email/webhook)
- Check the dashboard overview for the affected metric
- Filter traces by time window to find the regression
- Inspect individual failing traces to understand the root cause
- Identify the trigger — a deploy, model change, or traffic pattern shift
- Fix and verify with a dataset evaluation before redeploying
Tips
- Set different alert thresholds for different severity levels
- Use cooldown periods to prevent alert storms
- Keep webhook alerts for PagerDuty/OpsGenie integration
- Review the overview dashboard daily even without alerts — gradual drift won't trigger threshold alerts
See the Alerts & Usage guide for full alert configuration reference.