Testing

Production Monitoring

Testing doesn't stop at deployment. 2Signal evaluates production traces in real-time so you can catch quality regressions, cost spikes, and performance issues as they happen.

Always-on Evaluators

Any evaluator you enable in your project runs automatically on every new trace. No sampling config is needed for structural evaluators — they're free and add zero latency to your agent's response.

EvaluatorTypeCostWhy
Format checkJSON_SCHEMA or REGEXFreeCatch broken output formats instantly
Required contentCONTAINSFreeEnsure key info is always present
Response timeLATENCYFreeDetect latency regressions
Cost per traceCOSTFreeCatch cost spikes before the bill
Quality scoreLLM_JUDGE~$0.001/evalMeasure semantic quality

Setting Up Alerts

Configure alert rules in the dashboard to get notified when metrics breach your thresholds:

  1. Go to your project settings → Alerts
  2. Create a new alert rule
  3. Configure the metric, threshold, time window, cooldown, and delivery channel

Example: Quality Drop

Alert: Quality Drop
  Metric: EVAL_SCORE_AVG
  Evaluator: helpfulness
  Threshold: < 0.75
  Window: 1 hour
  Cooldown: 4 hours
  Channel: SLACK

Example: Cost Spike

Alert: Cost Spike
  Metric: TRACE_COST_AVG
  Threshold: > 0.15
  Window: 30 minutes
  Cooldown: 2 hours
  Channel: EMAIL

Example: High Error Rate

Alert: High Error Rate
  Metric: ERROR_RATE
  Threshold: > 0.05
  Window: 15 minutes
  Cooldown: 1 hour
  Channel: WEBHOOK

Dashboard Monitoring

The overview page gives you a real-time view of your agent's health:

  • Trace volume over time
  • Error rate trend
  • Eval pass rate trend
  • Cost over time
  • Latency percentiles (p50, p95, p99)
  • Eval score trends per evaluator
  • Cost breakdown by model
  • Error rate by span type

Sampling Strategy for LLM Judge

Structural evaluators are free, but LLM_JUDGE calls an LLM for each evaluation. Use sampling to control costs at higher volumes:

Trace VolumeRecommended SamplingEstimated Cost
< 1K/day100%< $1/day
1K-10K/day20-50%$2-5/day
10K-100K/day5-10%$5-10/day
> 100K/day1-5%$10-50/day

Incident Response Workflow

  1. Alert fires (Slack/email/webhook)
  2. Check the dashboard overview for the affected metric
  3. Filter traces by time window to find the regression
  4. Inspect individual failing traces to understand the root cause
  5. Identify the trigger — a deploy, model change, or traffic pattern shift
  6. Fix and verify with a dataset evaluation before redeploying

Tips

  • Set different alert thresholds for different severity levels
  • Use cooldown periods to prevent alert storms
  • Keep webhook alerts for PagerDuty/OpsGenie integration
  • Review the overview dashboard daily even without alerts — gradual drift won't trigger threshold alerts

See the Alerts & Usage guide for full alert configuration reference.

Have questions? Join our community!

Connect with other developers and the 2Signal team.

Join Discord