How big does your creative test need to be?
A real power-analysis calculator for creative testing. Inputs: baseline conversion rate, the lift you want to detect, and your statistical confidence. Outputs: conversions per side, visitor volume, spend estimate, and days to complete. The math most teams skip - and the reason most tests are under-powered.
Why most creative tests are under-powered
Calling a winner on 30-50 conversions is the most common creative-testing mistake. The variance is too high at those volumes - what looks like a 30% lift on the dashboard is often just noise.
This calculator runs the standard two-proportion z-test power analysis. Plug in your baseline conversion rate, the lift you'd need to act on, and your confidence level. The output tells you exactly how many conversions you need per side before you can call anything.
Practical rule of thumb:for most DTC accounts (1-3% CVR, 20-30% MDE), you need 5,000-15,000 conversions per side at α=0.10. Spending less? Lower your expectations - you're running directional tests, not statistical ones.
Inputs
Tell us about your test
Adjust the sliders and dropdowns. Outputs update live.
Result
What this test will actually need
22.3k
1.5M
$4.5M
595
Too long - increase traffic or relax MDE
Scenarios
How MDE changes everything
Smaller MDE = much larger test. Total visitors needed across 6 different MDE scenarios.
Reading this chart: if your scenario row goes red (60+ days), the test is impractical at your current traffic level. Either run for longer, accept a bigger MDE, or relax your significance level from 0.05 to 0.10.
- Does:Standard two-proportion z-test power analysis. Same math as Evan Miller's calculator and most A/B-testing platforms.
- Doesn't:Account for sequential testing, early stopping, multiple comparisons (more than 2 variants), or Bayesian priors. If you're running multivariate or checking daily, use a proper experiment platform (Marpipe, Optimizely, Statsig).
- Doesn't: Calculate incrementality. In-platform A/B is not the same as a geo-holdout. For high-AOV or considered-purchase categories, validate your dashboard winner with a follow-up incrementality test.
- Spend estimate is rough: assumes constant CPC and conversion rate. Real-world variance is wider. Treat as an order-of-magnitude estimate, not a budget commitment.
Test the winning concept. Then ship 25 variants of it.
Sample-size discipline tells you when to call a winner. Shuttergen turns that winner into the next 25 brand-safe variants you can test - hours, not weeks. The strategist stays in the loop; the production grind goes away.
Try Shuttergen freeSources
What we read to build this
Sample size sorted. Now ship the variants.
Shuttergen turns one winning concept into 25 brand-safe variants - the production half of the testing loop.
Start free