Model performance
Quantitative performance against a representative evaluation set. Task-specific accuracy, precision and recall, calibration, latency, cost-per-call, and stability across runs. Reported with confidence intervals and against a baseline of credible alternatives.
