AI Feature Evaluation Framework Prompt
Prompt
Design an evaluation framework for an AI feature. Feature: [describe] Inputs and outputs: [describe] What 'good' looks like: [how would an expert judge a great output?] What 'bad' looks like: [examples of failures] Scale: [outputs per day] Please design an eval framework with: 1. Automated metrics (no human review needed) 2. Human eval criteria: rubric with scoring dimensions 3. Golden set: how to build reference input/output pairs 4. Sample size per evaluation cycle 5. Decision rule: at what quality threshold do you ship, watch, or roll back?
Used by
DevelopersData Analysts