Conversion rate optimisation

A/B test statistical significance: how long should you run a test?

Run an A/B test until it reaches 90-95% statistical significance, or for at least four weeks if it is sitting above 70%, with a two-week minimum per variable. Statistical significance is the confidence that your result reflects a real difference, not random chance, and it is what protects you from calling a test too early.

The fastest way to waste an A/B test is to call it too early. A few extra conversions on the variation feels like a win, so you ship it, then watch the lift evaporate. Statistical significance is what protects you from that trap. Here is what it means and how long to actually run your tests.

9 March 2026By Cobus van der Westhuizen6 min read

A/B test statistical significance and how long to run a test

Cobus van der WesthuizenCEO

By Cobus van der Westhuizen · Full bio

Juicy Designs 15+ Years of Experience

UPDATED March 30, 2026

REVIEWED BY Wynand van der Westhuizen

FACT CHECKED BY Lenata Oosthuizen

EDITORIAL POLICIES Learn about our editorial policies

Written by Cobus van der Westhuizen Reviewed March 2026 15+ years experience 64+ SA clients Google certified

Sources: VWO: A/B testing statistical significance | Optimizely: statistical significance

TL;DR: Quick Answer

Run an A/B test until it reaches 90-95% statistical significance, or for at least four weeks if it is sitting above 70%, with a two-week minimum per variable to cover at least two full weekly cycles. Statistical significance is the confidence that your result reflects a real difference, not random chance. Sample size, not calendar time, is what gets you there, so low-traffic sites must run tests longer. Never peek and stop early, and never lower your confidence threshold to manufacture a winner.

Key takeaways

Statistical significance at 95% means there is only a 5% probability the difference happened by luck
Run tests to 90-95% significance, or four weeks if stalled above 70%, with a two-week minimum per variable
Significance depends on the number of conversions, not just calendar time, so low-traffic sites run longer
Peeking at results and stopping early locks in a result that was never stable
Test one variable at a time so a result has a clear, actionable cause
Seasonal spikes like Black Friday distort baseline behaviour, so avoid those windows or run long enough that they wash out

Most A/B tests fail not because the idea was wrong, but because the test was read wrong. A variation shows an early lead, someone gets excited, and the test is called before the data has settled. Statistical significance is the discipline that stops you acting on noise, and getting it right is the difference between a real conversion lift and a number that quietly disappears the month after you ship. This article is part of the wider conversion rate optimisation process.

What does statistical significance mean in A/B testing?

Statistical significance is the confidence that your test result reflects a real difference, not random chance. A result at 95% significance means there is only a 5% probability the difference happened by luck. It is the threshold that tells you a variation genuinely won, rather than just appearing to win on a small sample.

Every test has natural variation. Two identical pages will rarely convert at exactly the same rate over any given week. Significance testing quantifies whether the gap between your control and variation is large enough, across enough visitors, to trust. Without it, you are making permanent decisions on noise. For the foundations, see our A/B testing guide for South Africa.

A/B test significance thresholds and what to do
Significance level	What it means	Recommended action
Below 70%	Result is unstable and largely noise	Keep running, do not act
70-90%	Suggestive but not conclusive	Run to four weeks, then make a reasoned call
90%	10% chance the result is random	Reasonable for lower-risk changes
95%	5% chance the result is random	Standard threshold for high-stakes changes

How long should you run an A/B test?

Run an A/B test until it reaches 90-95% statistical significance (the convention used by testing platforms like VWO and Optimizely), or for at least four weeks if it is sitting above 70%. As a minimum, run every test for two full weeks per variable to cover at least two complete weekly cycles. This captures the natural differences between weekdays, weekends, and paydays.

The rule of thumb balances confidence against patience. Reaching 90-95% significance is the cleanest signal to act on. If a test stalls around 70-80% and will not climb, running it to the four-week mark gives you enough accumulated data to make a reasonable call. Anything shorter than two weeks risks letting a single unusual day, a viral post, a payday spike, distort the result. This sits inside the broader conversion rate optimisation process.

“The most expensive habit in CRO is calling a test on day three because the variation is up. We set the duration and the significance threshold before the test launches, then we leave it alone. A two-week minimum per variable is not bureaucracy, it is what stops a single payday from rewriting your conversion strategy.”
Cobus van der Westhuizen, Founder & Digital Strategist, Juicy Designs, reviewed and verified March 2026

Run an A/B test to 90-95% statistical significance, or four weeks if it is stuck above 70%, with a two-week minimum per variable. The two-week floor covers at least two complete weekly cycles, capturing weekday, weekend and payday differences. Significance fluctuates wildly before enough data accumulates, so the duration and threshold should be fixed before launch. Source: Juicy Designs CRO practice, plus VWO and Optimizely significance conventions.

Why sample size matters for low-traffic sites

Sample size matters because significance depends on the number of conversions, not just calendar time. A low-traffic site needs to run tests for longer to accumulate enough visitors and conversions to reach confidence. A page with 200 visits a month simply cannot produce a trustworthy result in two weeks.

If your traffic is modest, you have a few options. Test higher up the funnel where volume is greatest, such as your homepage or main landing page. Test bolder changes that produce bigger effects, since large differences reach significance faster than small ones. Or extend the test window and accept that fewer, slower tests are the honest reality of lower traffic. Do not compensate by lowering your confidence threshold, that just manufactures false winners. Driving more qualified traffic through SEO and Google Ads also shortens the time every future test needs to reach significance.

Why shouldn’t you peek at results and stop early?

You shouldn’t stop early because significance fluctuates wildly before enough data accumulates, and “peeking” lets you cherry-pick a moment that looks like a win. Early in a test, a variation can show 99% significance one day and 60% the next. Stopping the moment it looks good locks in a result that was never stable.

This is one of the most common and costly A/B testing mistakes. Decide your test duration and significance threshold before you launch, then leave it alone until those conditions are met. Checking progress is fine; acting on a premature spike is not. Remove emotion from the decision and let the predetermined data thresholds make the call, not your excitement about an early lead.

What are the most common A/B testing mistakes?

The most common mistakes are calling tests too early, testing too many variables at once, and ignoring seasonality. Each one corrupts your data: early calls capture noise, multiple variables make it impossible to know what worked, and seasonal spikes like Black Friday distort baseline behaviour and inflate or hide true effects.

Test one variable at a time so a result has a clear cause. If you change the headline and the button and the layout together, a lift tells you nothing actionable. Watch the calendar too, running a test across Black Friday, a long weekend, or a major payday means your control and variation are measured under abnormal conditions. Either avoid those windows or run long enough that they wash out. Above all, decide on data, not on which version you personally prefer. For a structured pre-launch check, use our CRO audit for South Africa.

Frequently asked questions

Is 90% statistical significance good enough to act on?

90% is a reasonable threshold for lower-risk changes, meaning a 10% chance the result is random. For high-stakes changes affecting revenue or major pages, aim for 95%. If a test is above 70% but will not reach 90%, running it past four weeks gives enough data to make a sensible, defensible decision.

Last updated: 2026-03-30

Can I run multiple A/B tests at the same time?

Yes, but only on separate, non-overlapping pages or audiences so they do not contaminate each other. Running two tests on the same page or funnel step makes it impossible to attribute results. For most South African sites with moderate traffic, running tests sequentially produces cleaner, more trustworthy answers.

Last updated: 2026-03-30

Should I pause A/B tests during Black Friday?

Usually yes. Black Friday and other seasonal peaks produce abnormal visitor behaviour that distorts both your control and variation, so results will not generalise to normal periods. Either pause tests during these windows, or run them long enough afterwards that the seasonal spike becomes a small part of the overall dataset.

Last updated: 2026-03-30

Cobus founded Juicy Designs in 2015 and has spent over a decade running conversion experiments and SEO programmes for South African businesses across automotive, entertainment, professional services, retail and insurance. He personally oversees CRO and testing strategy for Juicy Designs client accounts and reviews every article published on this site for factual accuracy and current market relevance.

Founder of Juicy Designs, established 2015
64+ South African clients, 4.9-star Google rating
Google Ads certified practitioner
Google Analytics 4 certified
Specialist in SEO, paid media & conversion-focused web design
Reviewed and updated March 2026

View on LinkedIn About Juicy Designs

A/B test statistical significance: how long should you run a test?

What does statistical significance mean in A/B testing?

How long should you run an A/B test?

Why sample size matters for low-traffic sites

Why shouldn’t you peek at results and stop early?

What are the most common A/B testing mistakes?

Frequently asked questions

Run experiments that reach real significance