Who is the Confidence Bootcamp for?

The bootcamp is designed for anyone who wants to improve their experimentation skills. Courses are tailored for data scientists, analysts, engineers, product managers, and leaders — whether you are running your first A/B test or scaling an experimentation program across your organization.

Is the bootcamp free?

Yes, the Confidence Bootcamp is completely free. All 11 courses, 90+ lessons, and resources are available at no cost. You can start learning immediately without creating an account, though signing in lets you track your progress across devices.

The bootcamp covers the full experimentation lifecycle: A/B testing fundamentals, hypothesis formulation, interpreting experiment results, metrics design, sample size calculation, feature flags, and building an experimentation culture. It includes 11 courses with over 90 lessons built by the Confidence team at Spotify.

How long does the bootcamp take to complete?

The full bootcamp takes approximately 20 hours to complete across all 11 courses. Individual courses range from 30 minutes to 3 hours. You can learn at your own pace and pick the courses most relevant to your role.

Do I need prior experience with A/B testing or statistics?

No prior experience is required. The bootcamp starts with foundational courses like Intro to Experimentation and progressively covers more advanced topics like sequential testing and variance reduction. Each course clearly indicates which roles it is designed for.

Who created the Confidence Bootcamp?

The Confidence Bootcamp was created by the Confidence team at Spotify, the same team that builds the experimentation and feature flagging platform used across Spotify. The content reflects real-world experimentation practices used at one of the world's largest digital products.

Lesson 3: Baseline Mean Variance and the MDE

Summary

This lesson teaches you about how the metric affects the sample size required to power an experiment and what the Minimum Detectable Effect (MDE) is:

The baseline variance of the outcome metric.
The baseline mean of the outcome metric.
What the Minimum Detectable Effect (MDE) and the relative MDE are.

Baseline variance

The 'baseline variance', is just a fancy way of saying "the variance of the outcome metric under no treatment". The baseline variance directly affects the risk management of the experiment and therefore the sample size calculation.

As we will see in the following sections, the baseline variance goes directly into the sample size calculation formula. If the variance increases with 10%, then the sample size required to detect a certain MDE with a certain power also increases 10%.

Baseline mean and the MDE

Refresher on MDE

If you need a refresher on the concept of MDE, check out Lesson 6 in the hypothesis testing course or watch this video:

Baseline mean

The baseline mean is just another word for "the average value of the outcome metric under no treatment." The baseline mean is important because the relative MDE is translated into an absolute MDE using the baseline mean.

Relation between relative and absolute MDE

Baseline means close to zero

If the baseline mean is very small, then even a large relative MDE might correspond to a practically irrelevant absolute MDE. This is particularly common for binary metrics with very low rates.

Consider the following example:

The baseline mean of crash rates is 0.0001. That is, one out of 10,000 users experiences a crash on average. If we want to detect a 10% relative MDE, then the absolute MDE is 0.00001. So we want to detect if the rate goes from 0.0001 to 0.00011, or, in other words, if one more user per 100,000 users experiences a crash on average.

Since this is a very small change, we need a large sample size to detect it.

Absolute versus Relative MDE

At this point you might wonder why, if we aren't aware of the baseline mean, why do we use the relative MDE to express the effect we want to be able to detect? This is the right question to ask! There are two reasons for why the relative effect is often used. First, it is a nice way to understand impact in metrics where the absolute values are hard to interpret or have intuition for like minutes played at Spotify. Second, because of the first reason, most experimentation tools let the user specify the MDE on a relative scale. Which scale to use makes most sense is mainly a matter of preference. In any case, understanding this relation and the importance of always considering the baseline mean is helpful for all experimenters seeking to understand sample size calculations.

Reader exercise

How does the baseline variance of the outcome metric affect the sample size required for an experiment?

Higher baseline variance decreases the required sample size

Baseline variance doesn't affect the required sample size

Higher baseline variance increases the required sample size

Baseline variance only affects experiments with large MDEs

Reader exercise

What happens if the baseline mean of the outcome metric is very small?

A large relative MDE might still correspond to a practically irrelevant absolute MDE

The absolute MDE becomes irrelevant, regardless of the relative MDE

The sample size required will always be small

The baseline variance becomes more important than the baseline mean

Lesson 3: Baseline Mean Variance and the MDE

Summary

This lesson teaches you about how the metric affects the sample size required to power an experiment and what the Minimum Detectable Effect (MDE) is:

The baseline variance of the outcome metric.
The baseline mean of the outcome metric.
What the Minimum Detectable Effect (MDE) and the relative MDE are.

Baseline means close to zero

If the baseline mean is very small, then even a large relative MDE might correspond to a practically irrelevant absolute MDE. This is particularly common for binary metrics with very low rates.

Consider the following example:

Since this is a very small change, we need a large sample size to detect it.

Absolute versus Relative MDE

Reader exercise

How does the baseline variance of the outcome metric affect the sample size required for an experiment?

Higher baseline variance decreases the required sample size

Baseline variance doesn't affect the required sample size

Higher baseline variance increases the required sample size

Baseline variance only affects experiments with large MDEs

Reader exercise

What happens if the baseline mean of the outcome metric is very small?

A large relative MDE might still correspond to a practically irrelevant absolute MDE

The absolute MDE becomes irrelevant, regardless of the relative MDE

The sample size required will always be small

The baseline variance becomes more important than the baseline mean