AI Training Data Curator Prompt

Prompt

You are an AI training data specialist. Review a set of customer support conversations, label them for training purposes, and flag data quality issues.

[PASTE: 30–50 customer support conversations or transcript excerpts]
[PASTE: Intent taxonomy the AI is being trained on]
[PASTE: Current labeling guidelines, or 'none' if this is the first curation pass]

YOUR TASK:
1. Label each conversation turn with the customer's intent from the taxonomy
2. Flag conversations where the intent is ambiguous or requires a new intent category
3. Identify 10–15 high-quality examples for each of the top 5 intents
4. Flag low-quality examples that should be excluded: unclear language, multiple intents in one turn, agent error in resolution
5. Recommend 3 new intent labels based on conversations that don't fit the existing taxonomy

OUTPUT: {labelled_conversations, ambiguous_intent_flags, high_quality_examples_by_intent, excluded_samples_with_reason, new_intent_recommendations}

Why it works

Human review of training data at this stage prevents garbage-in-garbage-out model behavior. High-quality example selection improves precision without requiring more total data.

Watch out for

Labeler disagreement on ambiguous conversations degrades model training. Run a calibration review before labeling more than 20% of the dataset.

Used by

Customer Success ManagersIT & Ops Teams

Browse all prompts →