Cookbook
Webhook Evaluator
The Webhook evaluator sends trace data to your own HTTP endpoint for scoring. Use it when you need custom ML models, domain-specific logic, or integration with external systems that the built-in evaluators don't cover.
Prerequisites
- A 2Signal project with traces flowing in
- An HTTPS endpoint that accepts POST requests and returns scores
Step 1: Build Your Endpoint
Your endpoint receives a JSON payload with the trace data and must return a score:
Request (from 2Signal)
POST https://your-service.com/evaluate
Content-Type: application/json
{
"traceId": "abc-123",
"input": "What is the return policy?",
"output": "You can return items within 30 days...",
"config": {
"custom_field": "any config you set"
}
}Expected Response
{
"score": 0.85,
"label": "pass",
"reasoning": "Response correctly addresses the question with specific policy details."
}The response is validated with a Zod schema. score (0–1) is required. label and reasoning are optional.
Example Endpoint (Python/FastAPI)
from fastapi import FastAPI
from pydantic import BaseModel
app = FastAPI()
class EvalRequest(BaseModel):
traceId: str
input: str | None = None
output: str | None = None
config: dict | None = None
class EvalResponse(BaseModel):
score: float
label: str | None = None
reasoning: str | None = None
@app.post("/evaluate")
async def evaluate(req: EvalRequest) -> EvalResponse:
# Your custom evaluation logic here
score = run_your_model(req.input, req.output)
return EvalResponse(
score=score,
label="pass" if score > 0.7 else "fail",
reasoning=f"Custom model scored {score:.2f}"
)Step 2: Configure the Evaluator
In the dashboard, create a new evaluator:
{
"name": "custom-domain-check",
"type": "WEBHOOK",
"config": {
"url": "https://your-service.com/evaluate",
"headers": {
"Authorization": "Bearer your-secret-token"
},
"timeout_ms": 10000
}
}Step 3: Enable and Test
Enable the evaluator. The next trace ingested will trigger a POST to your endpoint. Check the trace detail page to see the score.
Production Considerations
- HTTPS required in production — Webhook evaluators enforce HTTPS to protect trace data in transit.
- Concurrency limit — Up to 10 concurrent requests per evaluator to prevent overwhelming your endpoint.
- Automatic retries — Failed requests are retried with exponential backoff.
- Timeout — Requests that exceed the configured timeout (default 10s) are scored as failures.
Debugging
If your webhook evaluator is not scoring traces, check:
- Your endpoint returns a valid JSON response with a
scorefield - The
scoreis a number between 0 and 1 - Your endpoint responds within the timeout window
- The URL is HTTPS (required in production)
- Any authentication headers are correct