Levenshtein
Measures the edit distance between the agent output and the expected output, normalized to a 0–1 similarity score. Useful when you need fuzzy matching rather than strict equality.
Config
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
threshold | number | No | 0.7 | Minimum similarity score (0–1) required to pass |
case_sensitive | boolean | No | false | Enable case-sensitive comparison |
Use Cases
- Fuzzy answer matching — Accept near-correct answers where minor typos, rephrasing, or formatting differences shouldn't cause a hard fail.
- Regression detection — Track output drift over time by monitoring how similar current outputs are to a known-good baseline.
- Data extraction quality — Evaluate extracted entities (names, addresses) that may have small variations from the ground truth.
- Translation and paraphrase evaluation — Measure surface-level similarity between generated and reference text when exact match is too strict.
Examples
Default threshold (0.7)
// Pass if output is at least 70% similar to expected
{
"threshold": 0.7
}
// expectedOutput: "The quick brown fox"
// Output: "The quick brown fox" → score: 1.0, pass
// Output: "The quick brown dog" → score: 0.84, pass
// Output: "Something entirely different" → score: 0.18, failStrict similarity
// Require 90% similarity
{
"threshold": 0.9,
"case_sensitive": true
}
// expectedOutput: "Hello World"
// Output: "Hello World" → score: 1.0, pass
// Output: "Hello World!" → score: 0.92, pass
// Output: "hello world" → score: 0.82, fail (case mismatch counts as edits)Lenient fuzzy match
// Accept loosely similar outputs
{
"threshold": 0.5
}
// expectedOutput: "customer support"
// Output: "customer service" → pass (above 0.5)
// Output: "billing department" → fail (below 0.5)Scoring
Returns a continuous score between 0.0 and 1.0, calculated as1 - (edit_distance / max_length), rounded to two decimal places. The label is "pass" if the score meets or exceeds the threshold, "fail" otherwise. Returns 0.0 if no expectedOutput is provided. Strings are capped at 10,000 characters for safety.
Performance
Levenshtein uses an optimized two-row dynamic programming algorithm with O(min(m,n)) space complexity. No external API calls are made. For typical agent outputs (under a few thousand characters), execution time is well under 100ms. Strings are capped at 10,000 characters to prevent excessive computation.