A/B test statistical significance: how long should you run a test?
Run an A/B test until it reaches 90-95% statistical significance, or for at least four weeks if it is sitting above 70%, with a two-week minimum per variable. Statistical significance is the confidence that your result reflects a real difference, not random chance, and it is what protects you from calling a test too early.
The fastest way to waste an A/B test is to call it too early. A few extra conversions on the variation feels like a win, so you ship it, then watch the lift evaporate. Statistical significance is what protects you from that trap. Here is what it means and how long to actually run your tests.

TL;DR: Quick Answer
Run an A/B test until it reaches 90-95% statistical significance, or for at least four weeks if it is sitting above 70%, with a two-week minimum per variable to cover at least two full weekly cycles. Statistical significance is the confidence that your result reflects a real difference, not random chance. Sample size, not calendar time, is what gets you there, so low-traffic sites must run tests longer. Never peek and stop early, and never lower your confidence threshold to manufacture a winner.
Key takeaways
- Statistical significance at 95% means there is only a 5% probability the difference happened by luck
- Run tests to 90-95% significance, or four weeks if stalled above 70%, with a two-week minimum per variable
- Significance depends on the number of conversions, not just calendar time, so low-traffic sites run longer
- Peeking at results and stopping early locks in a result that was never stable
- Test one variable at a time so a result has a clear, actionable cause
- Seasonal spikes like Black Friday distort baseline behaviour, so avoid those windows or run long enough that they wash out
Most A/B tests fail not because the idea was wrong, but because the test was read wrong. A variation shows an early lead, someone gets excited, and the test is called before the data has settled. Statistical significance is the discipline that stops you acting on noise, and getting it right is the difference between a real conversion lift and a number that quietly disappears the month after you ship. This article is part of the wider conversion rate optimisation process.
What does statistical significance mean in A/B testing?
Statistical significance is the confidence that your test result reflects a real difference, not random chance. A result at 95% significance means there is only a 5% probability the difference happened by luck. It is the threshold that tells you a variation genuinely won, rather than just appearing to win on a small sample.
Every test has natural variation. Two identical pages will rarely convert at exactly the same rate over any given week. Significance testing quantifies whether the gap between your control and variation is large enough, across enough visitors, to trust. Without it, you are making permanent decisions on noise. For the foundations, see our A/B testing guide for South Africa.
| Significance level | What it means | Recommended action |
|---|---|---|
| Below 70% | Result is unstable and largely noise | Keep running, do not act |
| 70-90% | Suggestive but not conclusive | Run to four weeks, then make a reasoned call |
| 90% | 10% chance the result is random | Reasonable for lower-risk changes |
| 95% | 5% chance the result is random | Standard threshold for high-stakes changes |
How long should you run an A/B test?
Run an A/B test until it reaches 90-95% statistical significance (the convention used by testing platforms like VWO and Optimizely), or for at least four weeks if it is sitting above 70%. As a minimum, run every test for two full weeks per variable to cover at least two complete weekly cycles. This captures the natural differences between weekdays, weekends, and paydays.
The rule of thumb balances confidence against patience. Reaching 90-95% significance is the cleanest signal to act on. If a test stalls around 70-80% and will not climb, running it to the four-week mark gives you enough accumulated data to make a reasonable call. Anything shorter than two weeks risks letting a single unusual day, a viral post, a payday spike, distort the result. This sits inside the broader conversion rate optimisation process.
“The most expensive habit in CRO is calling a test on day three because the variation is up. We set the duration and the significance threshold before the test launches, then we leave it alone. A two-week minimum per variable is not bureaucracy, it is what stops a single payday from rewriting your conversion strategy.”
Cobus van der Westhuizen, Founder & Digital Strategist, Juicy Designs, reviewed and verified March 2026
Run an A/B test to 90-95% statistical significance, or four weeks if it is stuck above 70%, with a two-week minimum per variable. The two-week floor covers at least two complete weekly cycles, capturing weekday, weekend and payday differences. Significance fluctuates wildly before enough data accumulates, so the duration and threshold should be fixed before launch. Source: Juicy Designs CRO practice, plus VWO and Optimizely significance conventions.
Why sample size matters for low-traffic sites
Sample size matters because significance depends on the number of conversions, not just calendar time. A low-traffic site needs to run tests for longer to accumulate enough visitors and conversions to reach confidence. A page with 200 visits a month simply cannot produce a trustworthy result in two weeks.
If your traffic is modest, you have a few options. Test higher up the funnel where volume is greatest, such as your homepage or main landing page. Test bolder changes that produce bigger effects, since large differences reach significance faster than small ones. Or extend the test window and accept that fewer, slower tests are the honest reality of lower traffic. Do not compensate by lowering your confidence threshold, that just manufactures false winners. Driving more qualified traffic through SEO and Google Ads also shortens the time every future test needs to reach significance.
Why shouldn’t you peek at results and stop early?
You shouldn’t stop early because significance fluctuates wildly before enough data accumulates, and “peeking” lets you cherry-pick a moment that looks like a win. Early in a test, a variation can show 99% significance one day and 60% the next. Stopping the moment it looks good locks in a result that was never stable.
This is one of the most common and costly A/B testing mistakes. Decide your test duration and significance threshold before you launch, then leave it alone until those conditions are met. Checking progress is fine; acting on a premature spike is not. Remove emotion from the decision and let the predetermined data thresholds make the call, not your excitement about an early lead.
What are the most common A/B testing mistakes?
The most common mistakes are calling tests too early, testing too many variables at once, and ignoring seasonality. Each one corrupts your data: early calls capture noise, multiple variables make it impossible to know what worked, and seasonal spikes like Black Friday distort baseline behaviour and inflate or hide true effects.
Test one variable at a time so a result has a clear cause. If you change the headline and the button and the layout together, a lift tells you nothing actionable. Watch the calendar too, running a test across Black Friday, a long weekend, or a major payday means your control and variation are measured under abnormal conditions. Either avoid those windows or run long enough that they wash out. Above all, decide on data, not on which version you personally prefer. For a structured pre-launch check, use our CRO audit for South Africa.
Frequently asked questions
Is 90% statistical significance good enough to act on?
90% is a reasonable threshold for lower-risk changes, meaning a 10% chance the result is random. For high-stakes changes affecting revenue or major pages, aim for 95%. If a test is above 70% but will not reach 90%, running it past four weeks gives enough data to make a sensible, defensible decision.
Can I run multiple A/B tests at the same time?
Yes, but only on separate, non-overlapping pages or audiences so they do not contaminate each other. Running two tests on the same page or funnel step makes it impossible to attribute results. For most South African sites with moderate traffic, running tests sequentially produces cleaner, more trustworthy answers.
Should I pause A/B tests during Black Friday?
Usually yes. Black Friday and other seasonal peaks produce abnormal visitor behaviour that distorts both your control and variation, so results will not generalise to normal periods. Either pause tests during these windows, or run them long enough afterwards that the seasonal spike becomes a small part of the overall dataset.
