Lesson 6: Calculation frequency

When you set up an experiment, you need to decide when you want results to be displayed.

Deliver results continuously or upon conclusion

There are two options for when to calculate results:

  • Continuously display results to see new results every hour or day when the experiment is live.
  • Upon Conclusion to see results when you end the experiment.

For both settings, all experiments run until you decide to stop them. The largest difference between the two options is that in experiments with results delivered Upon Conclusion, you can't view the results for your metrics during the experiment. The results show up when you end your experiment. For experiments using results calculated Continuously, you can follow the experiment results during the course of the experiment and decide to end it whenever you want.

If you're wondering why not always choose to view results continuously—stay tuned! We'll explain that shortly.

Two strategies to avoid being fooled by randomness

Imagine that you run an experiment that truly has no impact whatsoever on any metric. You just split users in two groups, measure some outcome, and calculate the difference between the groups every day. Even if the experiment didn't actually change anything, you can expect to see results fluctuate over time, just by random chance. If you check the results every day, you might on some days see a difference that's so large that you wrongly assume that this experiment impacts the metric. Such results are called false positive results. Checking the results every day means there are multiple chances to find a false significant result. To avoid getting fooled by randomness and draw the wrong conclusions based on a false positive result, you can use either of two strategies to keep this risk under control.

  • Calculate result upon conclusion and use standard statistical tests

    Before starting your experiment, you define a point in time when you plan to calculate the results. To make sure that you have a good chance of finding an impact, you calculate what sample size you need to reliably detect an effect of a certain size. By only calculating the results this one time, you have only one chance to be fooled by a false positive result. This method is the most efficient way to minimize the risks of false positives.

  • Calculate the results continuously and use sequential tests

    You calculate results every day or every hour, and correct for the increased risk of being fooled by randomness. The statistical methods used, known as sequential tests, correct for the multiple peeks at the data by using a stricter threshold to conclude significance. With this approach, you can check the results daily without hesitation. Because the tests use a stricter threshold for calling significance, the impact needs to be larger to be reliably detected. This means that for an experiment with results updated continuously, you either need to increase the sample size, or accept that you lose some sensitivity to detect changes. Sequential tests trade off sensitivity for faster results.

Learn about difference between different evaluation frequencies in 2 minutes and 14 seconds.

Automatic sequential monitoring

Modern experimentation platforms run automatic sequential monitoring checks regardless of the evaluation frequency you choose. This means that you do not have to choose to calculate results continuously to sleep well at night—the platform monitors your experiment for deterioration in all your metrics.

What you should choose

The trade-off is different in all experiments. Sequential experiments offer you the opportunity to stop experiments early at the expense of some certainty. If speed is more important than precise estimates of the treatment effect, continuously updating the results is the better choice.

For a given experiment length, like say two weeks, only calculating the results once at the end gives more precise estimates. If you are going to run the experiment for a fixed amount of time regardless, select Upon Conclusion to maximize your chances of finding effects.