Guide

Prompt A/B Testing

A/B testing lets you compare prompt template versions by splitting traffic and measuring per-variant scores with statistical rigor.

How It Works

  1. Create a test linking two or more prompt template versions, each with a traffic weight (weights must sum to 100).
  2. Start the test. SDK calls to GET /api/v1/ab-test?name=... return a randomly selected variant based on weights.
  3. As your agent runs, record scores for each variant via POST /api/v1/ab-test with the variant ID and score.
  4. 2Signal computes running statistics (mean, variance) and performs Welch's t-test for statistical significance.
  5. When all variants have 30+ scores and significance is reached, the test auto-completes with a winner recommendation.

Test Lifecycle

StatusDescription
DRAFTCreated but not yet running — configure variants and weights
RUNNINGActively splitting traffic and collecting scores
STOPPEDPaused — can be resumed
COMPLETEDFinished — winner determined or manually completed

Statistical Significance

Results include Welch's t-test statistics:

  • p-value — Probability that the difference is due to chance (significant at p < 0.05)
  • 95% confidence interval — Range of the true difference between variants
  • Absolute and relative difference — How much better the winning variant is
  • Winner recommendation — Which variant to keep

SDK Integration

import twosignal

client = twosignal.TwoSignal(api_key="your-key")

# Get the variant to use for this request
variant = client.get_ab_test_variant("my-prompt-test")
prompt = variant["content"]

# ... run your agent with this prompt ...

# Record a score for this variant
client.record_ab_test_score(
    variant_id=variant["variant_id"],
    score=0.85
)

Dashboard

The A/B test detail page shows KPIs per variant (impressions, mean score, score count), a variant performance comparison table, and a statistical significance panel with the t-test results.

You can also manually record scores or simulate test data from the dashboard for testing purposes.

Have questions? Join our community!

Connect with other developers and the 2Signal team.

Join Discord