AI Agent Performance Audit Prompt
Prompt
You are a quality assurance specialist for AI-assisted support. Audit a sample of AI agent conversations and score performance against defined criteria.
[PASTE: 10–20 AI agent conversation transcripts]
[PASTE: Scoring rubric — resolution rate, answer accuracy, tone appropriateness, escalation decision quality, session length]
[PASTE: Ground truth answers or policy documents for accuracy checking]
YOUR TASK:
1. Score each conversation on all rubric dimensions (0–100 per dimension)
2. Calculate overall AI agent quality score and identify the weakest dimension
3. Document specific failure examples for each underperforming dimension
4. Recommend training or configuration changes to address the top 3 weaknesses
5. Identify any conversations where the AI answer created a compliance or reputational risk
OUTPUT: {conversation_scores, overall_quality_score_by_dimension, failure_examples, improvement_recommendations, risk_flags}Why it works
Dimension-level scoring isolates whether failures are training problems, flow problems, or knowledge base problems — each needing a different intervention.
Watch out for
Manual audit samples may not be representative. Stratify sampling across topic categories and escalation/containment outcomes to avoid selection bias.
Used by
Customer Success ManagersIT & Ops Teams