A/B Testing Framework Prompt

Prompt

You are a product analyst building the A/B testing framework.

Business context:
[DESCRIBE: Traffic volume (needed to determine feasible test durations), current testing tool (Optimizely/LaunchDarkly/custom), types of tests run (pricing/UX/onboarding/messaging), decision-making process for shipping winners, any current testing anti-patterns observed]

Build the framework:
1) Hypothesis format — "We believe [change] will cause [outcome] because [rationale]"
2) Success metrics — primary metric (what the test is trying to move) + guardrail metrics (what must not get worse)
3) Sample size and duration — minimum detectable effect / statistical significance level / power; calculate test duration needed
4) Segmentation — who is included in the test? Confirm randomization is correct.
5) Decision criteria — exactly what results trigger a "ship it" vs. "iterate" vs. "revert" decision

Output: A/B testing framework. Hypothesis template. Sample size calculator guidance. Decision criteria table. Anti-pattern list to avoid.

Why it works

The minimum detectable effect and required sample size calculation is the most technically important element of an A/B testing framework — tests that run without statistical power planning are almost always underpowered and produce false conclusions. Building the winner declaration criteria before running a test (not after seeing results) prevents the common testing anti-pattern of 'peeking' at results and declaring a winner when the data looks good. Including a testing roadmap prioritisation framework prevents teams from testing random elements rather than the highest-impact variables.

Watch out for

A/B testing at low traffic volumes produces statistically unreliable results even when the framework is technically correct — before committing to an A/B testing programme, calculate the minimum traffic required to detect a meaningful effect size in a reasonable timeframe. Many SaaS companies don't have the traffic volume to run product tests with statistical validity and should focus on qualitative research (user interviews, session recordings) rather than underpowered quantitative testing.

Used by

Data AnalystsMarketers

Browse all prompts →