Human Review & Labeling
Automated evaluators catch patterns, but some failure modes require human judgment. Review queues let you build a manual QA workflow where team members label traces one-by-one, then export the results as golden test sets.
When to Use Review Queues
- Building datasets from production traffic — Select interesting traces, review them, and export approved ones to a dataset for regression testing.
- Labeling for fine-tuning — Create labeled positive/negative examples from real traces to improve your model.
- Quality audits — Periodically review a sample of traces to validate that automated evaluators are catching real issues.
- Edge case triage — Route low-confidence or flagged traces to human reviewers for judgment calls.
Creating a Review Queue
Navigate to Dashboard → Project → Review and click Create Queue. Give the queue a name and optional description.
Queues have three statuses:
| Status | Description |
|---|---|
ACTIVE | Accepting items and reviews |
COMPLETED | All items reviewed — ready for export |
ARCHIVED | No longer active, preserved for reference |
Adding Traces to a Queue
From the Traces page, select one or more traces and click Add to Review Queue. You can also add traces programmatically via the tRPC review.addItems procedure.
Each trace becomes a ReviewItem with its own lifecycle:
| Status | Description |
|---|---|
PENDING | Not yet reviewed |
IN_REVIEW | Currently being reviewed |
APPROVED | Marked as good — eligible for dataset export |
REJECTED | Marked as bad — will not be exported |
SKIPPED | Skipped by reviewer — remains for later |
Reviewing Traces
The review detail page offers two modes:
- List mode — A filterable table of all items in the queue. Click any item to see its trace details.
- Sequential mode — A split-pane view with the trace on the left and the review panel on the right. This mode is optimized for speed — review one trace, submit, and the next one loads automatically.
Keyboard Shortcuts (Sequential Mode)
| Key | Action |
|---|---|
1 | Set sentiment to POSITIVE |
2 | Set sentiment to NEUTRAL |
3 | Set sentiment to NEGATIVE |
Enter | Submit review and advance to next item |
Each review captures a sentiment (POSITIVE, NEUTRAL, NEGATIVE), an optional label (free text), and optional notes. When you submit a review, a TraceAnnotation is automatically created on the trace with the same sentiment, label, and notes.
Exporting to a Dataset
Once you have reviewed items, click Export to Dataset to create dataset items from all APPROVED traces. This gives you a curated set of golden examples built from real production traffic.
Only approved items are exported. Rejected and skipped items are excluded.
Permissions
| Action | Required Role |
|---|---|
| View queues and items | MEMBER+ |
| Create queues, add items | MEMBER+ |
| Submit reviews | MEMBER+ |
| Update/delete queues | ADMIN+ |
| Export to dataset | MEMBER+ |