Bias Detection
Hybrid deterministic and LLM evaluator for detecting demographic bias in agent outputs. The deterministic mode scans for known biased phrases across five categories: gender, race, age, disability, and religion. LLM mode uses OpenAI to detect subtler forms of bias including stereotyping, microaggressions, and exclusionary language.
Config
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
mode | string | No | deterministic | deterministic, llm, or both |
model | string | No | gpt-4o-mini | OpenAI model for LLM mode |
api_key | string | No | env OPENAI_API_KEY | OpenAI API key for LLM mode |
categories | string[] | No | all categories | Bias categories to check: gender, race, age, disability, religion |
custom_terms | string[] | No | [] | Additional biased terms to flag |
check | string | No | output | input, output, or both |
Use Cases
- HR and recruiting agents — Ensure hiring assistants don't produce biased language about candidates based on gender, age, or disability.
- Customer-facing chatbots — Monitor for demographic stereotyping or exclusionary language in agent responses to diverse user populations.
- Content generation — Validate that generated marketing copy, job descriptions, or educational content is free from biased language.
- Compliance reporting — Track bias detection rates over time for DEI compliance and continuous improvement of your agent's outputs.
Examples
Deterministic scan across all categories
{
"mode": "deterministic",
"check": "output"
}
// Scans for ~30 known biased phrases across gender, race, age, disability, religionFocused categories with custom terms
{
"mode": "deterministic",
"categories": ["gender", "age"],
"custom_terms": ["manpower", "chairman", "elderly"]
}LLM-based detection for subtle bias
{
"mode": "llm",
"model": "gpt-4o",
"categories": ["gender", "race", "age"],
"check": "output"
}Combined mode for maximum coverage
{
"mode": "both",
"model": "gpt-4o-mini",
"check": "both"
}
// Final score is the minimum of deterministic and LLM scoresScoring
In deterministic mode, each matched biased term reduces the score by 0.15 from 1.0 (minimum 0.0). In LLM mode, the model rates bias on a 1–5 scale which is normalized to 0.0–1.0. In both mode, the final score is the minimum of the two scores. A score of 1.0 means no bias detected (pass); anything below 1.0 is a fail. The reasoning field lists matched terms by category and/or the LLM's analysis with specific biased phrases identified.
Performance
Deterministic mode runs in under 1ms with no external calls — simple case-insensitive substring matching. LLM mode adds an OpenAI API call (1–3 seconds). In both mode, both checks run and the stricter score wins. Text is truncated to 4,000 characters before LLM analysis. Empty text inputs short-circuit to a pass.