Understand how Optimize calculates confidence in test optimization results.
Good to know
Stat. sig. only applies to test optimizations — it isn’t used in AI Optimize or Personalize.
Statistical significance — or stat. sig. — helps you understand whether the results from a test optimization are likely caused by your changes or just random chance. It’s only used in test optimizations and is calculated after enough data has been collected.
What is statistical significance?
Statistical significance is a confidence metric used in data analysis. It answers a key question: “How likely is it that the observed difference between variations is real — not just noise from a random spike or dip in traffic?”
In Optimize, stat. sig. is calculated using standard statistical models based on sample size, conversion behavior, and the size of the lift between your test variation and the control.
How stat. sig. works in Optimize
Once your test optimization has collected enough data, Optimize automatically calculates a confidence score. This tells you how likely it is that your variation’s performance — positive or negative — is the result of the changes you made, rather than random visitor behavior.
Stat. sig. always compares variations to No Change, which acts as the control. If a variation shows a 20% lift with 90% statistical significance, that means the system is 90% confident that the improvement is real — and not due to chance.
How to interpret stat. sig. scores
- Higher stat. sig. means more confidence that your variation caused the results
- Lower stat. sig. means more uncertainty. The results might change as more data comes in.
- Less than 1% stat. sig. means there isn’t enough data yet to make a reliable judgment. Expect results to fluctuate.
What impacts statistical significance?
Several factors affect how quickly stat. sig. is reached and how confident the results are:
Time — tests need to run long enough to smooth out short-term spikes or dips in traffic. For example, a 30-day test is typically more reliable than a 7-day test.
Sample size — the more visitors in each variation group, the more reliable the data. Tests with only a few dozen visitors per variation are less likely to reach stat. sig. than those with thousands.
Effect size — the size of the difference between your variation and the control matters. Bigger differences typically reach stat. sig. faster, while small lifts take more time to validate.