Cookbook

Monitor Cost & Latency

This recipe shows you how to track LLM spend and response times per trace, set up alerts when budgets or SLAs are breached, and use the dashboard to spot trends before they become problems.

Cost Evaluator

The COST evaluator scores each trace based on its total LLM spend. Create one via the dashboard or the API:

{
  "type": "COST",
  "name": "cost-budget",
  "config": {
    "max_cost_usd": 0.10,
    "target_cost_usd": 0.03
  }
}

How scoring works: If the trace cost is at or below target_cost_usd, the score is 1.0. Between the target and max_cost_usd, the score decreases linearly toward 0.0. If the cost exceeds max_cost_usd, the score is 0.0. This gives you a clear signal: 1.0 means on-budget, anything below 1.0 means you are trending toward your ceiling.

Latency Evaluator

The LATENCY evaluator enforces response-time SLAs using the same linear scoring model:

{
  "type": "LATENCY",
  "name": "latency-sla",
  "config": {
    "max_ms": 10000,
    "target_ms": 3000
  }
}

How scoring works: A trace that completes within target_ms scores 1.0. Between the target and max_ms, the score drops linearly to 0.0. Anything slower than max_ms scores 0.0.

Set Up Alerts

Evaluator scores are computed asynchronously after each trace is ingested. You can attach alert rules so you are notified the moment a metric crosses a threshold over a rolling time window.

Alert When Avg Cost Exceeds Budget

This rule fires when the average cost per trace over the last hour exceeds $0.08:

{
  "metric": "TRACE_COST_AVG",
  "operator": "GREATER_THAN",
  "threshold": 0.08,
  "window_minutes": 60,
  "cooldown_minutes": 120,
  "channel": "SLACK",
  "channel_config": {
    "webhook_url": "https://hooks.slack.com/services/T00/B00/xxx"
  }
}

Alert When P95 Latency Exceeds SLA

This rule fires when the 95th-percentile latency over the last 30 minutes exceeds 8 seconds:

{
  "metric": "P95_LATENCY",
  "operator": "GREATER_THAN",
  "threshold": 8000,
  "window_minutes": 30,
  "cooldown_minutes": 60,
  "channel": "EMAIL",
  "channel_config": {
    "to": "oncall@yourcompany.com"
  }
}

Available Alert Metrics

EVAL_PASS_RATE — Percentage of traces that pass a given evaluator
EVAL_SCORE_AVG — Average evaluator score across traces
ERROR_RATE — Percentage of traces with an ERROR status
P95_LATENCY — 95th-percentile trace duration in milliseconds
TRACE_COST_AVG — Average LLM cost per trace in USD

Delivery Channels

EMAIL — Sends a notification via Resend to the configured address
SLACK — Posts to a Slack channel via an incoming webhook URL
WEBHOOK — Sends a POST request to any custom URL with the alert payload

The cooldown_minutes field prevents alert storms by suppressing duplicate notifications for the configured period after an alert fires.

Dashboard Charts

Once cost and latency evaluators are enabled, the project Overview page automatically displays Cost Over Time and Latency Percentiles charts. These update in real time as new traces are ingested and scored, so you can spot regressions at a glance without configuring anything extra.

Cost Optimization Tip

The most effective way to reduce cost is to avoid sending every request to your most expensive model. Use model routing to automatically direct simple queries to cheaper models like gpt-4.1-nano while reserving gpt-4o for complex ones. Pair routing with the cost evaluator to verify that your routing rules are actually keeping spend in check.

What's Next

Evaluate Outputs — Add quality and correctness evaluators alongside cost and latency.
Model Routing — Route requests to different models based on complexity to optimize cost.

Have questions? Join our community.

Connect with other developers and the 2Signal team.

Join Discord