✏️Prompts

AI Feature Evaluation Framework Prompt

Prompt

Design an evaluation framework for an AI feature.

Feature: [describe]
Inputs and outputs: [describe]
What 'good' looks like: [how would an expert judge a great output?]
What 'bad' looks like: [examples of failures]
Scale: [outputs per day]

Please design an eval framework with:
1. Automated metrics (no human review needed)
2. Human eval criteria: rubric with scoring dimensions
3. Golden set: how to build reference input/output pairs
4. Sample size per evaluation cycle
5. Decision rule: at what quality threshold do you ship, watch, or roll back?

Used by

DevelopersData Analysts