Bias Detection

Hybrid deterministic and LLM evaluator for detecting demographic bias in agent outputs. The deterministic mode scans for known biased phrases across five categories: gender, race, age, disability, and religion. LLM mode uses OpenAI to detect subtler forms of bias including stereotyping, microaggressions, and exclusionary language.

Config

Field	Type	Required	Default	Description
`mode`	string	No	`deterministic`	`deterministic`, `llm`, or `both`
`model`	string	No	`gpt-4o-mini`	OpenAI model for LLM mode
`api_key`	string	No	env `OPENAI_API_KEY`	OpenAI API key for LLM mode
`categories`	string[]	No	all categories	Bias categories to check: `gender`, `race`, `age`, `disability`, `religion`
`custom_terms`	string[]	No	`[]`	Additional biased terms to flag
`check`	string	No	`output`	`input`, `output`, or `both`

Use Cases

HR and recruiting agents — Ensure hiring assistants don't produce biased language about candidates based on gender, age, or disability.
Customer-facing chatbots — Monitor for demographic stereotyping or exclusionary language in agent responses to diverse user populations.
Content generation — Validate that generated marketing copy, job descriptions, or educational content is free from biased language.
Compliance reporting — Track bias detection rates over time for DEI compliance and continuous improvement of your agent's outputs.

Examples

Deterministic scan across all categories

{
  "mode": "deterministic",
  "check": "output"
}
// Scans for ~30 known biased phrases across gender, race, age, disability, religion

Focused categories with custom terms

{
  "mode": "deterministic",
  "categories": ["gender", "age"],
  "custom_terms": ["manpower", "chairman", "elderly"]
}

LLM-based detection for subtle bias

{
  "mode": "llm",
  "model": "gpt-4o",
  "categories": ["gender", "race", "age"],
  "check": "output"
}

Combined mode for maximum coverage

{
  "mode": "both",
  "model": "gpt-4o-mini",
  "check": "both"
}
// Final score is the minimum of deterministic and LLM scores

Scoring

In deterministic mode, each matched biased term reduces the score by 0.15 from 1.0 (minimum 0.0). In LLM mode, the model rates bias on a 1–5 scale which is normalized to 0.0–1.0. In both mode, the final score is the minimum of the two scores. A score of 1.0 means no bias detected (pass); anything below 1.0 is a fail. The reasoning field lists matched terms by category and/or the LLM's analysis with specific biased phrases identified.

Performance

Deterministic mode runs in under 1ms with no external calls — simple case-insensitive substring matching. LLM mode adds an OpenAI API call (1–3 seconds). In both mode, both checks run and the stricter score wins. Text is truncated to 4,000 characters before LLM analysis. Empty text inputs short-circuit to a pass.

Have questions? Join our community.

Connect with other developers and the 2Signal team.

Join Discord