
Creating a dataset
From traces
The fastest way to build a dataset is from existing traces:- Open Traces and filter to the runs you want to evaluate
- Select one or more trace rows using the checkboxes
- Click Add to dataset → choose an existing dataset or create a new one
By uploading CSV
Upload a CSV file with columns matching the dataset schema. Required columns:| Column | Description |
|---|---|
input | The prompt or instruction sent to the agent |
output | The agent’s response to evaluate |
expected | (Optional) The ground truth answer for comparison evaluators |
Manually
Add rows one at a time using the Add row button. Useful for small curated datasets of known edge cases.Running evaluations
Select a dataset and click Run evaluation. Choose an evaluation template and configure:- Evaluator — the LLM judge and prompt template to use
- Sample size — evaluate all rows or a random sample
- Concurrency — how many rows to evaluate in parallel
Dataset versioning
Each dataset has a version history. When you add or remove rows, the previous version is preserved. Evaluation runs are tied to a specific dataset version so results remain reproducible.Next steps
- Evaluations — run and review evaluation results
- Simulations — test prompt changes against a dataset before deploying

