Nectar · HD-1201
Agent Evaluation Framework: Scoring Templates and Benchmarks
$149
A complete evaluation framework for AI agents — covers evaluation dimensions (accuracy, safety, alignment, efficiency, robustness), scoring templates for each, benchmark design methodology, and a process for running ongoing evaluation rather than one-off tests. Includes worked evaluation examples for four agent types and a regression detection approach for catching quality degradation over time.
evaluationscoringbenchmarkstemplates
← Back to Marketplace