LLM Playground
The Playground lets you test prompts against LLM providers directly from the dashboard. Iterate on prompts, compare responses across models, and run evaluators on outputs — without writing any code.
Getting Started
Navigate to Dashboard → Project → Playground. Before you can use it, you need to add at least one LLM provider key.
Adding LLM Provider Keys
Go to Project Settings or use the Playground setup prompt. Provider keys are encrypted with AES-256-GCM and stored per-project. Only project admins can manage keys.
Supported providers include OpenAI, Anthropic, Google, Mistral, Cohere, and Groq.
Running a Prompt
Enter your system prompt and user message, select a model, and click Run. The response streams in real-time via SSE. You can see token counts, latency, and estimated cost alongside the output.
Comparing Models
Use the Compare feature to send the same prompt to multiple models simultaneously. Responses appear side-by-side so you can evaluate quality, speed, and cost tradeoffs at a glance.
Running Evaluators
After generating a response, click Evaluate to run any of your project's configured evaluators against the playground output. This is useful for quickly testing whether a prompt change improves or regresses evaluator scores.
How It Works
- The playground sends requests via
POST /api/v1/playground/streamwhich proxies to the configured LLM provider using your encrypted API key. - Responses stream back via Server-Sent Events (SSE) for real-time display.
- Evaluation runs use the same eval engine as production traces — deterministic evaluators run instantly, LLM-based evaluators call OpenAI.
Permissions
Any project member (MEMBER+) can use the playground. Only ADMIN+ can manage LLM provider keys.