Webhook

Calls a user-provided HTTP endpoint with span data and expects a score back. This enables you to implement evaluation logic in any language or framework — Python, Go, Java, or any service that can handle an HTTP POST and return JSON.

Config

FieldTypeRequiredDefaultDescription
urlstringYesHTTPS endpoint to POST evaluation payload (must be HTTPS in production)
headersobjectNo{}Custom headers (e.g. Authorization)
timeoutMsnumberNo10000Request timeout in ms (min 1000, max 30000)
retriesnumberNo1Retries on 5xx or network errors (min 0, max 5)

Use Cases

  • Custom evaluation logic — Implement domain-specific scoring that doesn't fit built-in evaluators, such as checking database state, calling internal APIs, or running proprietary models.
  • Language flexibility — Write evaluators in Python, Ruby, Go, or any language your team already uses, without being limited to TypeScript.
  • Third-party integrations — Route evaluations through external services like custom ML models, compliance systems, or human review queues.
  • Gradual migration — Start with webhook evaluators and migrate to built-in types as your evaluation needs solidify.

Examples

Basic webhook

{
  "url": "https://eval.example.com/score",
  "timeoutMs": 5000
}

With authentication and retries

{
  "url": "https://eval.example.com/score",
  "headers": {
    "Authorization": "Bearer sk-my-secret-key"
  },
  "timeoutMs": 15000,
  "retries": 3
}

Expected response format

// Your endpoint must return JSON matching this schema:
{
  "score": 0.85,        // number, 0.0 to 1.0 (required)
  "pass": true,         // boolean (required)
  "reason": "Looks good", // string (optional)
  "metadata": {}        // object (optional)
}

Request payload sent to your endpoint

// 2Signal POSTs this JSON to your URL:
{
  "traceId": "abc-123",
  "spanId": "span-456",
  "input": "user message",
  "output": "agent response",
  "metadata": { /* evaluator config */ },
  "expectedOutput": null
}

Scoring

Returns the score (0.0–1.0) and pass boolean from your endpoint's response. If the webhook returns an HTTP error, times out, or returns invalid JSON, the evaluator scores 0.0 (fail) with the error details in the reasoning field.

Performance

Latency depends entirely on your endpoint. The default timeout is 10 seconds (configurable up to 30s). Retries use exponential backoff (1s, 2s, 4s, ...) on 5xx errors and network failures. A per-URL concurrency limit of 10 prevents overloading your endpoint. Non-5xx HTTP errors (e.g. 400, 401) are not retried.

Have questions? Join our community!

Connect with other developers and the 2Signal team.

Join Discord