Sentiment
Keyword-based sentiment detection that classifies agent output as positive, negative, or neutral. Includes negation handling (e.g. "not good" counts as negative).
Config
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
target | string | Yes | — | Expected sentiment: positive, negative, or neutral |
threshold | number | No | 0.6 | Minimum confidence score (0–1) required to pass |
Use Cases
- Customer service tone — Ensure agent responses maintain a positive tone when interacting with customers, especially in support and sales contexts.
- Content moderation — Flag responses that carry unexpected negative sentiment that could harm user experience or brand perception.
- Neutral reporting — Verify that agents providing factual information (e.g. news summaries, data reports) maintain a neutral tone without injecting opinion.
- Empathy detection — Confirm that agents responding to complaints or negative user input don't mirror the negative sentiment back.
Examples
Require positive sentiment
// Pass if output has positive sentiment
{
"target": "positive"
}
// Output: "Great news! Your order has been shipped successfully." → pass
// Output: "Your order has been cancelled due to an error." → failDetect negative sentiment
// Pass if output is negative (useful for testing complaint detection)
{
"target": "negative",
"threshold": 0.7
}
// Output: "This is terrible and frustrating." → pass
// Output: "The service is okay." → failNeutral tone enforcement
// Require neutral sentiment for factual responses
{
"target": "neutral"
}
// Output: "The meeting is scheduled for 3pm." → pass (no sentiment words detected)
// Output: "The amazing meeting is scheduled for 3pm!" → fail (positive words detected)Scoring
Returns 1.0 (pass) or 0.0 (fail). The evaluator detects sentiment by counting positive and negative keyword matches (with negation awareness). If over 60% of sentiment words are positive, the detected sentiment is "positive"; if over 60% are negative, it's "negative"; otherwise it's "neutral". The result passes when the detected sentiment matches the target and the confidence meets the threshold. The reasoning includes the detected sentiment, confidence percentage, and word counts.
Performance
Sentiment uses keyword lookup against built-in word lists (~50 positive and ~50 negative words) with no external API calls. Execution time is under 1ms for typical outputs. For more nuanced sentiment analysis, consider the LLM Judge evaluator with a sentiment-focused prompt.