Levenshtein

Measures the edit distance between the agent output and the expected output, normalized to a 0–1 similarity score. Useful when you need fuzzy matching rather than strict equality.

Config

FieldTypeRequiredDefaultDescription
thresholdnumberNo0.7Minimum similarity score (0–1) required to pass
case_sensitivebooleanNofalseEnable case-sensitive comparison

Use Cases

  • Fuzzy answer matching — Accept near-correct answers where minor typos, rephrasing, or formatting differences shouldn't cause a hard fail.
  • Regression detection — Track output drift over time by monitoring how similar current outputs are to a known-good baseline.
  • Data extraction quality — Evaluate extracted entities (names, addresses) that may have small variations from the ground truth.
  • Translation and paraphrase evaluation — Measure surface-level similarity between generated and reference text when exact match is too strict.

Examples

Default threshold (0.7)

// Pass if output is at least 70% similar to expected
{
  "threshold": 0.7
}
// expectedOutput: "The quick brown fox"
// Output: "The quick brown fox" → score: 1.0, pass
// Output: "The quick brown dog" → score: 0.84, pass
// Output: "Something entirely different" → score: 0.18, fail

Strict similarity

// Require 90% similarity
{
  "threshold": 0.9,
  "case_sensitive": true
}
// expectedOutput: "Hello World"
// Output: "Hello World" → score: 1.0, pass
// Output: "Hello World!" → score: 0.92, pass
// Output: "hello world" → score: 0.82, fail (case mismatch counts as edits)

Lenient fuzzy match

// Accept loosely similar outputs
{
  "threshold": 0.5
}
// expectedOutput: "customer support"
// Output: "customer service" → pass (above 0.5)
// Output: "billing department" → fail (below 0.5)

Scoring

Returns a continuous score between 0.0 and 1.0, calculated as1 - (edit_distance / max_length), rounded to two decimal places. The label is "pass" if the score meets or exceeds the threshold, "fail" otherwise. Returns 0.0 if no expectedOutput is provided. Strings are capped at 10,000 characters for safety.

Performance

Levenshtein uses an optimized two-row dynamic programming algorithm with O(min(m,n)) space complexity. No external API calls are made. For typical agent outputs (under a few thousand characters), execution time is well under 100ms. Strings are capped at 10,000 characters to prevent excessive computation.

Have questions? Join our community!

Connect with other developers and the 2Signal team.

Join Discord