Changelog
What's new in 2signal.
Deployment regression detection
Track deployments and automatically detect regressions across versions. Tag traces with a deployment ID from your CI/CD pipeline, then compare eval pass rates, error rates, latency, and cost between any two deployments with statistical significance testing (Welch's t-test + two-proportion z-test). Automatic alerts fire when a new deployment regresses on any metric. All five SDKs (Python, TypeScript, Go, Java, Ruby) support the new deploymentId config parameter. A new Deployments tab in the dashboard shows a timeline of deployments with delta badges and an interactive comparison panel.
Run batch evaluations from the dashboard
Datasets can now be tested against evaluators directly from the dashboard — no CLI required. A new 'Run Batch Evaluation' dialog on the Datasets page lets you pick an evaluator and kick off a batch eval with one click. Eval run results are viewable in a new detail page with per-item scores, pass/fail labels, reasoning, and input/output previews. Results can be exported as JSON or CSV. Also fixed a bug where the batch eval worker passed empty output to evaluators, and restored the Guardrails card on the project overview.
Agent Category Expansion: 6 new evaluators, 4 span types, full SDK parity
Major platform expansion to support 7 new AI agent categories: Voice/Phone, Multi-Agent Orchestration, DevOps/SRE, Legal/Compliance, HR/Recruiting, E-commerce, and Finance. Added 6 new evaluators (Bias Detection, Compliance Check, Factual Accuracy, Response Time SLA, PII Detection, Workflow Adherence), 4 new span types (Delegation, Voice, Human Handoff, Guardrail), and 4 new alert metrics (PII Leak Rate, Bias Score Avg, SLA Breach Rate, Workflow Deviation Rate). All 5 SDKs now include voice and multi-agent helpers plus a LangGraph wrapper for Python. SDK wrapper parity achieved with 25 new LLM wrappers across TypeScript, Go, Java, and Ruby. Dashboard additions include a multi-agent graph visualization, voice latency chart, compliance scorecard with CSV export, and one-click evaluator preset templates for each agent category.
SDK Wrapper Parity: All SDKs now support Cohere, Google, Groq, Mistral & more
Brought all 5 SDKs to feature parity with the Python SDK's LLM wrapper coverage. TypeScript SDK gained 8 new wrappers (Cohere, Google, Groq, Mistral, LangChain, LlamaIndex, CrewAI, AutoGen). Go SDK added 6 wrappers (OpenAI, Anthropic, Cohere, Google, Groq, Mistral). Java SDK added 6 wrappers (Anthropic, Cohere, Google, Groq, Mistral, LangChain4j). Ruby SDK added 5 wrappers (Anthropic, Cohere, Google, Groq, Mistral). All wrappers auto-trace LLM calls with model, token usage, and cost tracking.
Export & Sharing: CSV/JSON Export, Public Links, Notion Embedding
Export traces, datasets, and eval runs as CSV or JSON from any page. Create shareable public links with configurable expiry (1h, 24h, 7d, 30d, or never) for traces, datasets, and eval runs. Public pages render in a clean, minimal layout that works as Notion /embed blocks or any iframe. Share links can be revoked at any time. Full RBAC: viewers can export, members can create share links, admins can revoke. All export and sharing actions are audit-logged.
Custom Dashboards, Saved Views & Dashboard Improvements
Build fully customizable dashboards with drag-and-drop widget reordering (powered by dnd-kit), resizable widgets (1x1, 2x1, 1x2, 2x2), and 17 widget types including new Token Usage, Success Rate, and Model Usage charts. Save filtered trace views for quick access. Overview page now features a time range selector (7d/14d/30d/90d), KPI trend deltas comparing current vs previous period, and polished chart styling. Also fixes: playground output now displays correctly, span tree scrolling works for Model Parameters, and tRPC mutations no longer block on query batches.
Prompt A/B Testing
Compare prompt template versions with traffic splitting and statistical significance. Create experiments with weighted variants, track per-variant scores in real-time, and get automated winner detection via Welch's t-test (p-value, 95% CI). Tests auto-complete when significance is reached. Full dashboard with KPI cards, variant performance table, stat sig comparison panel, and built-in score simulation for testing.
Prompt Template Dashboard
Create prompt templates directly from the dashboard with auto-extracted {{variable}} detection. Click any template row to expand and view all versions with full prompt content, commit messages, span usage counts, and one-click copy.
Java SDK v0.1.0
New Java SDK (Maven: com.twosignal:twosignal-sdk) with singleton client, daemon thread batch exporter, ThreadLocal context propagation, SpanHandle.run() for scoped tracing, cost calculation, and OpenAI wrapper. Java 11+ compatible.
Ruby SDK v0.1.0
New Ruby SDK (gem: twosignal) with zero runtime dependencies, daemon thread exporter, Thread.current context propagation, block-based span tracing, observe method decorator, and OpenAI wrapper. Ruby 3.0+ compatible.
10 new evaluators for comprehensive AI agent testing
Added EXACT_MATCH, LENGTH, STARTS_WITH, ENDS_WITH, LEVENSHTEIN, SENTIMENT, TOXICITY, GROUNDEDNESS, TOOL_CALL_VALIDATION, and PROMPT_INJECTION evaluators. Includes LLM-based hallucination detection and prompt injection defense.
Stripe SDK validation
Added server-side validation for Stripe SDK integration to ensure billing events are properly verified before processing.
Rate limiting for REST API
Implemented per-API-key rate limiting using Redis sliding window. Prevents abuse and ensures fair usage across all projects.
Contact page
Added a contact sales page to the marketing site with form validation.
Python SDK v0.1.0
Initial release of the Python SDK with @observe decorator, OpenAI and Anthropic wrappers, background batch export, and automatic cost tracking.
7 built-in evaluators
Shipped LLM Judge, Contains, Regex Match, JSON Schema, Similarity, Latency, and Cost evaluators. Evaluators run async via BullMQ workers.
REST API for trace ingestion
POST /api/v1/traces endpoint for batch trace ingestion with S3 persistence, usage tracking, and async database writes.
Platform launch
Initial launch of 2Signal with trace ingestion, evaluation engine, dashboard, and Python SDK support for OpenAI and Anthropic.