Who is the Confidence Bootcamp for?

The bootcamp is designed for anyone who wants to improve their experimentation skills. Courses are tailored for data scientists, analysts, engineers, product managers, and leaders — whether you are running your first A/B test or scaling an experimentation program across your organization.

Is the bootcamp free?

Yes, the Confidence Bootcamp is completely free. All 11 courses, 90+ lessons, and resources are available at no cost. You can start learning immediately without creating an account, though signing in lets you track your progress across devices.

The bootcamp covers the full experimentation lifecycle: A/B testing fundamentals, hypothesis formulation, interpreting experiment results, metrics design, sample size calculation, feature flags, and building an experimentation culture. It includes 11 courses with over 90 lessons built by the Confidence team at Spotify.

How long does the bootcamp take to complete?

The full bootcamp takes approximately 20 hours to complete across all 11 courses. Individual courses range from 30 minutes to 3 hours. You can learn at your own pace and pick the courses most relevant to your role.

Do I need prior experience with A/B testing or statistics?

No prior experience is required. The bootcamp starts with foundational courses like Intro to Experimentation and progressively covers more advanced topics like sequential testing and variance reduction. Each course clearly indicates which roles it is designed for.

Who created the Confidence Bootcamp?

The Confidence Bootcamp was created by the Confidence team at Spotify, the same team that builds the experimentation and feature flagging platform used across Spotify. The content reflects real-world experimentation practices used at one of the world's largest digital products.

Lesson 6: Calculation frequency

Summary

Calculating results only once ("Upon Conclusion") is the most efficient way to set up your experiments. You only get to see the results at the end of your experiment, which gives you higher sensitivity.
Calculating results continuously sacrifices sensitivity for faster results. Select this if getting a rough estimate early on is important to you.

When you set up an experiment, you need to decide when you want results to be displayed.

Deliver results continuously or upon conclusion

There are two options for when to calculate results:

Continuously display results to see new results every hour or day when the experiment is live.
Upon Conclusion to see results when you end the experiment.

For both settings, all experiments run until you decide to stop them. The largest difference between the two options is that in experiments with results delivered Upon Conclusion, you can't view the results for your metrics during the experiment. The results show up when you end your experiment. For experiments using results calculated Continuously, you can follow the experiment results during the course of the experiment and decide to end it whenever you want.

If you're wondering why not always choose to view results continuously—stay tuned! We'll explain that shortly.

Two strategies to avoid being fooled by randomness

Imagine that you run an experiment that truly has no impact whatsoever on any metric. You just split users in two groups, measure some outcome, and calculate the difference between the groups every day. Even if the experiment didn't actually change anything, you can expect to see results fluctuate over time, just by random chance. If you check the results every day, you might on some days see a difference that's so large that you wrongly assume that this experiment impacts the metric. Such results are called false positive results. Checking the results every day means there are multiple chances to find a false significant result. To avoid getting fooled by randomness and draw the wrong conclusions based on a false positive result, you can use either of two strategies to keep this risk under control.

Calculate result upon conclusion and use standard statistical tests

Before starting your experiment, you define a point in time when you plan to calculate the results. To make sure that you have a good chance of finding an impact, you calculate what sample size you need to reliably detect an effect of a certain size. By only calculating the results this one time, you have only one chance to be fooled by a false positive result. This method is the most efficient way to minimize the risks of false positives.
Calculate the results continuously and use sequential tests

You calculate results every day or every hour, and correct for the increased risk of being fooled by randomness. The statistical methods used, known as sequential tests, correct for the multiple peeks at the data by using a stricter threshold to conclude significance. With this approach, you can check the results daily without hesitation. Because the tests use a stricter threshold for calling significance, the impact needs to be larger to be reliably detected. This means that for an experiment with results updated continuously, you either need to increase the sample size, or accept that you lose some sensitivity to detect changes. Sequential tests trade off sensitivity for faster results.

Learn about difference between different evaluation frequencies in 2 minutes and 14 seconds.

Automatic sequential monitoring

Modern experimentation platforms run automatic sequential monitoring checks regardless of the evaluation frequency you choose. This means that you do not have to choose to calculate results continuously to sleep well at night—the platform monitors your experiment for deterioration in all your metrics.

In Confidence

Confidence monitors your experiment using sequential tests for all checks, regardless of the evaluation frequency you choose.

What you should choose

The trade-off is different in all experiments. Sequential experiments offer you the opportunity to stop experiments early at the expense of some certainty. If speed is more important than precise estimates of the treatment effect, continuously updating the results is the better choice.

For a given experiment length, like say two weeks, only calculating the results once at the end gives more precise estimates. If you are going to run the experiment for a fixed amount of time regardless, select Upon Conclusion to maximize your chances of finding effects.

Example

At Spotify, most experiments use Upon Conclusion and only display results when the experiment ends. For early abortion of experiments due to errors or negative user experiences, experimenters rely on Confidence's monitoring of their experiments. For the product decision, most teams want at least two weeks of data to make decisions to ship.

Reader exercise

What setup should you select if you need to be able to see results every day?

Upon Conclusion

Continuous

Upon Conclusion or Continuous, both give results every day