Bias Detection

Hybrid deterministic and LLM evaluator for detecting demographic bias in agent outputs. The deterministic mode scans for known biased phrases across five categories: gender, race, age, disability, and religion. LLM mode uses OpenAI to detect subtler forms of bias including stereotyping, microaggressions, and exclusionary language.

Config

FieldTypeRequiredDefaultDescription
modestringNodeterministicdeterministic, llm, or both
modelstringNogpt-4o-miniOpenAI model for LLM mode
api_keystringNoenv OPENAI_API_KEYOpenAI API key for LLM mode
categoriesstring[]Noall categoriesBias categories to check: gender, race, age, disability, religion
custom_termsstring[]No[]Additional biased terms to flag
checkstringNooutputinput, output, or both

Use Cases

  • HR and recruiting agents — Ensure hiring assistants don't produce biased language about candidates based on gender, age, or disability.
  • Customer-facing chatbots — Monitor for demographic stereotyping or exclusionary language in agent responses to diverse user populations.
  • Content generation — Validate that generated marketing copy, job descriptions, or educational content is free from biased language.
  • Compliance reporting — Track bias detection rates over time for DEI compliance and continuous improvement of your agent's outputs.

Examples

Deterministic scan across all categories

{
  "mode": "deterministic",
  "check": "output"
}
// Scans for ~30 known biased phrases across gender, race, age, disability, religion

Focused categories with custom terms

{
  "mode": "deterministic",
  "categories": ["gender", "age"],
  "custom_terms": ["manpower", "chairman", "elderly"]
}

LLM-based detection for subtle bias

{
  "mode": "llm",
  "model": "gpt-4o",
  "categories": ["gender", "race", "age"],
  "check": "output"
}

Combined mode for maximum coverage

{
  "mode": "both",
  "model": "gpt-4o-mini",
  "check": "both"
}
// Final score is the minimum of deterministic and LLM scores

Scoring

In deterministic mode, each matched biased term reduces the score by 0.15 from 1.0 (minimum 0.0). In LLM mode, the model rates bias on a 1–5 scale which is normalized to 0.0–1.0. In both mode, the final score is the minimum of the two scores. A score of 1.0 means no bias detected (pass); anything below 1.0 is a fail. The reasoning field lists matched terms by category and/or the LLM's analysis with specific biased phrases identified.

Performance

Deterministic mode runs in under 1ms with no external calls — simple case-insensitive substring matching. LLM mode adds an OpenAI API call (1–3 seconds). In both mode, both checks run and the stricter score wins. Text is truncated to 4,000 characters before LLM analysis. Empty text inputs short-circuit to a pass.

Have questions? Join our community!

Connect with other developers and the 2Signal team.

Join Discord