Cookbook

Monitor Cost & Latency

This recipe shows you how to track LLM spend and response times per trace, set up alerts when budgets or SLAs are breached, and use the dashboard to spot trends before they become problems.

Cost Evaluator

The COST evaluator scores each trace based on its total LLM spend. Create one via the dashboard or the API:

{
  "type": "COST",
  "name": "cost-budget",
  "config": {
    "max_cost_usd": 0.10,
    "target_cost_usd": 0.03
  }
}

How scoring works: If the trace cost is at or below target_cost_usd, the score is 1.0. Between the target and max_cost_usd, the score decreases linearly toward 0.0. If the cost exceeds max_cost_usd, the score is 0.0. This gives you a clear signal: 1.0 means on-budget, anything below 1.0 means you are trending toward your ceiling.

Latency Evaluator

The LATENCY evaluator enforces response-time SLAs using the same linear scoring model:

{
  "type": "LATENCY",
  "name": "latency-sla",
  "config": {
    "max_ms": 10000,
    "target_ms": 3000
  }
}

How scoring works: A trace that completes within target_ms scores 1.0. Between the target and max_ms, the score drops linearly to 0.0. Anything slower than max_ms scores 0.0.

Set Up Alerts

Evaluator scores are computed asynchronously after each trace is ingested. You can attach alert rules so you are notified the moment a metric crosses a threshold over a rolling time window.

Alert When Avg Cost Exceeds Budget

This rule fires when the average cost per trace over the last hour exceeds $0.08:

{
  "metric": "TRACE_COST_AVG",
  "operator": "GREATER_THAN",
  "threshold": 0.08,
  "window_minutes": 60,
  "cooldown_minutes": 120,
  "channel": "SLACK",
  "channel_config": {
    "webhook_url": "https://hooks.slack.com/services/T00/B00/xxx"
  }
}

Alert When P95 Latency Exceeds SLA

This rule fires when the 95th-percentile latency over the last 30 minutes exceeds 8 seconds:

{
  "metric": "P95_LATENCY",
  "operator": "GREATER_THAN",
  "threshold": 8000,
  "window_minutes": 30,
  "cooldown_minutes": 60,
  "channel": "EMAIL",
  "channel_config": {
    "to": "oncall@yourcompany.com"
  }
}

Available Alert Metrics

  • EVAL_PASS_RATE — Percentage of traces that pass a given evaluator
  • EVAL_SCORE_AVG — Average evaluator score across traces
  • ERROR_RATE — Percentage of traces with an ERROR status
  • P95_LATENCY — 95th-percentile trace duration in milliseconds
  • TRACE_COST_AVG — Average LLM cost per trace in USD

Delivery Channels

  • EMAIL — Sends a notification via Resend to the configured address
  • SLACK — Posts to a Slack channel via an incoming webhook URL
  • WEBHOOK — Sends a POST request to any custom URL with the alert payload

The cooldown_minutes field prevents alert storms by suppressing duplicate notifications for the configured period after an alert fires.

Dashboard Charts

Once cost and latency evaluators are enabled, the project Overview page automatically displays Cost Over Time and Latency Percentiles charts. These update in real time as new traces are ingested and scored, so you can spot regressions at a glance without configuring anything extra.

Cost Optimization Tip

The most effective way to reduce cost is to avoid sending every request to your most expensive model. Use model routing to automatically direct simple queries to cheaper models like gpt-4.1-nano while reserving gpt-4o for complex ones. Pair routing with the cost evaluator to verify that your routing rules are actually keeping spend in check.

What's Next

  • Evaluate Outputs — Add quality and correctness evaluators alongside cost and latency.
  • Model Routing — Route requests to different models based on complexity to optimize cost.

Have questions? Join our community!

Connect with other developers and the 2Signal team.

Join Discord