SynAE

A Framework To Evaluate Synthetic Agent Benchmarks
Step 1 - Upload Original And Synthetic Benchmarks
SynAE Input Format
Data Tool Calls Output Attribute1 Attribute2
All benchmark datasets (original and synthetic) must have the same columns.
Data is required. Include Tool Calls, Output, or both. At least one is required alongside Data. Attribute columns are dataset-specific and optional. Use these to label or categorize traces.
If an import format is selected below, uploaded trace files will be automatically converted to the SynAE format before evaluation.
Learn more about the SynAE input format and benchmark-specific configuration →
Synthetic benchmark datasets CSV: add one, or multiple to compare
No synthetic benchmark datasets added yet.
Benchmark-specific configuration
Step 2 - Select SynAE Metrics For Evaluation
Metrics in SynAE are grouped into three categories:
Fidelity: how closely the synthetic data matches the statistical properties of the original
Validity: how well the synthetic tool calls and outputs are consistent with the given instructions
Diversity: how varied the synthetic data is across samples, indicating coverage of the benchmark
Learn more about SynAE metrics →
LLM API configuration for validity metrics
Required only if Validity metrics are enabled above. The API key is forwarded to the evaluation backend and is not stored.
Step 3 - Analyze SynAE Results
Upload your datasets in Step 1, configure metrics in Step 2, then click Run evaluation through the SynAE backend. Evaluation runs on the first 100 rows of each dataset. Alternatively, load a previously computed result JSON.
Run evaluation
Evaluate your synthetic datasets against the original via the SynAE backend. Results appear automatically when done.
or
Load results
Upload a previously computed results JSON.