Suites of Trials with adversarial Judges and a Lead-ready Scorecard by task type.
Run models and roles on a task.
Exact, rubric, visual, and more.
The cheapest capable agent.