Skip to main content
GLOSSARY · DEFINITION

scenario evaluation

Structured test harness executing representative tasks to score model or agent performance.