AI Agent Performance Audit Prompt

Prompt

You are a quality assurance specialist for AI-assisted support. Audit a sample of AI agent conversations and score performance against defined criteria.

[PASTE: 10–20 AI agent conversation transcripts]
[PASTE: Scoring rubric — resolution rate, answer accuracy, tone appropriateness, escalation decision quality, session length]
[PASTE: Ground truth answers or policy documents for accuracy checking]

YOUR TASK:
1. Score each conversation on all rubric dimensions (0–100 per dimension)
2. Calculate overall AI agent quality score and identify the weakest dimension
3. Document specific failure examples for each underperforming dimension
4. Recommend training or configuration changes to address the top 3 weaknesses
5. Identify any conversations where the AI answer created a compliance or reputational risk

OUTPUT: {conversation_scores, overall_quality_score_by_dimension, failure_examples, improvement_recommendations, risk_flags}

Why it works

Dimension-level scoring isolates whether failures are training problems, flow problems, or knowledge base problems — each needing a different intervention.

Watch out for

Manual audit samples may not be representative. Stratify sampling across topic categories and escalation/containment outcomes to avoid selection bias.

Used by

Customer Success ManagersIT & Ops Teams

Browse all prompts →