Monitor Cost & Latency
This recipe shows you how to track LLM spend and response times per trace, set up alerts when budgets or SLAs are breached, and use the dashboard to spot trends before they become problems.
Cost Evaluator
The COST evaluator scores each trace based on its total LLM spend. Create one via the dashboard or the API:
{
"type": "COST",
"name": "cost-budget",
"config": {
"max_cost_usd": 0.10,
"target_cost_usd": 0.03
}
}How scoring works: If the trace cost is at or below target_cost_usd, the score is 1.0. Between the target and max_cost_usd, the score decreases linearly toward 0.0. If the cost exceeds max_cost_usd, the score is 0.0. This gives you a clear signal: 1.0 means on-budget, anything below 1.0 means you are trending toward your ceiling.
Latency Evaluator
The LATENCY evaluator enforces response-time SLAs using the same linear scoring model:
{
"type": "LATENCY",
"name": "latency-sla",
"config": {
"max_ms": 10000,
"target_ms": 3000
}
}How scoring works: A trace that completes within target_ms scores 1.0. Between the target and max_ms, the score drops linearly to 0.0. Anything slower than max_ms scores 0.0.
Set Up Alerts
Evaluator scores are computed asynchronously after each trace is ingested. You can attach alert rules so you are notified the moment a metric crosses a threshold over a rolling time window.
Alert When Avg Cost Exceeds Budget
This rule fires when the average cost per trace over the last hour exceeds $0.08:
{
"metric": "TRACE_COST_AVG",
"operator": "GREATER_THAN",
"threshold": 0.08,
"window_minutes": 60,
"cooldown_minutes": 120,
"channel": "SLACK",
"channel_config": {
"webhook_url": "https://hooks.slack.com/services/T00/B00/xxx"
}
}Alert When P95 Latency Exceeds SLA
This rule fires when the 95th-percentile latency over the last 30 minutes exceeds 8 seconds:
{
"metric": "P95_LATENCY",
"operator": "GREATER_THAN",
"threshold": 8000,
"window_minutes": 30,
"cooldown_minutes": 60,
"channel": "EMAIL",
"channel_config": {
"to": "oncall@yourcompany.com"
}
}Available Alert Metrics
EVAL_PASS_RATE— Percentage of traces that pass a given evaluatorEVAL_SCORE_AVG— Average evaluator score across tracesERROR_RATE— Percentage of traces with an ERROR statusP95_LATENCY— 95th-percentile trace duration in millisecondsTRACE_COST_AVG— Average LLM cost per trace in USD
Delivery Channels
EMAIL— Sends a notification via Resend to the configured addressSLACK— Posts to a Slack channel via an incoming webhook URLWEBHOOK— Sends a POST request to any custom URL with the alert payload
The cooldown_minutes field prevents alert storms by suppressing duplicate notifications for the configured period after an alert fires.
Dashboard Charts
Once cost and latency evaluators are enabled, the project Overview page automatically displays Cost Over Time and Latency Percentiles charts. These update in real time as new traces are ingested and scored, so you can spot regressions at a glance without configuring anything extra.
Cost Optimization Tip
The most effective way to reduce cost is to avoid sending every request to your most expensive model. Use model routing to automatically direct simple queries to cheaper models like gpt-4.1-nano while reserving gpt-4o for complex ones. Pair routing with the cost evaluator to verify that your routing rules are actually keeping spend in check.
What's Next
- Evaluate Outputs — Add quality and correctness evaluators alongside cost and latency.
- Model Routing — Route requests to different models based on complexity to optimize cost.