Who is the Confidence Bootcamp for?

The bootcamp is designed for anyone who wants to improve their experimentation skills. Courses are tailored for data scientists, analysts, engineers, product managers, and leaders — whether you are running your first A/B test or scaling an experimentation program across your organization.

Is the bootcamp free?

Yes, the Confidence Bootcamp is completely free. All 11 courses, 90+ lessons, and resources are available at no cost. You can start learning immediately without creating an account, though signing in lets you track your progress across devices.

The bootcamp covers the full experimentation lifecycle: A/B testing fundamentals, hypothesis formulation, interpreting experiment results, metrics design, sample size calculation, feature flags, and building an experimentation culture. It includes 11 courses with over 90 lessons built by the Confidence team at Spotify.

How long does the bootcamp take to complete?

The full bootcamp takes approximately 20 hours to complete across all 11 courses. Individual courses range from 30 minutes to 3 hours. You can learn at your own pace and pick the courses most relevant to your role.

Do I need prior experience with A/B testing or statistics?

No prior experience is required. The bootcamp starts with foundational courses like Intro to Experimentation and progressively covers more advanced topics like sequential testing and variance reduction. Each course clearly indicates which roles it is designed for.

Who created the Confidence Bootcamp?

The Confidence Bootcamp was created by the Confidence team at Spotify, the same team that builds the experimentation and feature flagging platform used across Spotify. The content reflects real-world experimentation practices used at one of the world's largest digital products.

Lesson 2: A Refresher on Alpha and Power

Summary

This lesson is a brief summary of lessons 5 and 6 from the Hypothesis Testing course, to make sure you have what you need to understand the sample size calculation course.

Possible outcomes in experiments

In an experiment, there either exists a treatment effect or there doesn't, and you either detect it or you don't. This gives us four possible outcomes depicted below.

Across many experiments, these four outcomes will occur with some rates. That is, if we run 100 experiments, some number of them will end up in each of the four quadrants.

In hypothesis testing, alpha is used as a parameter to control the rate of false positive results among the experiments that have no effect, and power is used to control the rate of true positive results among the experiments where there is an effect. We call alpha and power the intended error rates of the test.

Our goal with experimentation is to control the rates of incorrect and correct results. We can trade off between the rates of false positives and false negatives by changing the alpha, power, and sample size of our test. In fact, there are several things that affect the risk handling in experiments, which we will cover in future courses. But for now, let's not get ahead of ourselves.

By using a statistically valid test with a certain alpha, and a sample size large enough for a certain MDE to achieve a certain power, we can:

Bound the proportion of experiments without an effect that falsely detect an effect to be lower than or equal to alpha
Bound the proportion of experiments with an effect of MDE (or larger) that correctly detect that effect to be larger than or equal to power.

Video recap

If you haven't already, watch this 4-minute and 31-second video to quickly review what we've learned so far:

Win rate across all experiments

Having powered tests does not bound the true positive rate across all experiments you run. It only bounds the true positive rate for the subset of experiments that have a true treatment effect of MDE or larger.

In practice, some experiments will have a non-zero effect smaller than the MDE for which we have designed the test. In those experiments, our chance to detect the treatment effect will be smaller than power.

The best we can do is to make sure that we select MDEs that map to the smallest effect size that is practically relevant for our business. By powering all experiments to detect that effect, we can ensure that our true positive rate is at least power for all experiments in which the true effect is of a relevant size.

The nonlinearity of alpha and power

It is important to understand how the alpha and power parameters affect the sample size. Because we, in most cases, use a Z-test for evaluating experiments, a normal distribution underlies the dependency between required sample size, alpha and power. This means that the required sample size is not increasing linearly with alpha or power. This makes it much harder to reason about how the required sample size changes with changes to the alpha or power.

The Alpha z-value

In the sample size calculation, the alpha parameter comes into the equation via a z-value. Although this is the same type of z-scores that we have discussed in previous lessons, here let's not focus on the rationale for the z-value being in the equation, but rather on the relation between alpha and the z-value.

The alpha enters into this sample size formula via a z-value because we are using a Z-test that is based on the asymptotic normality of the difference-in-means sample estimator. It is good to know that the relation between alpha and z-alpha is nonlinear. This implies that changing the alpha by a fixed value will change the required sample size by different amounts depending on the alpha you had to begin with. Changing from 0.02 to 0.01 will increase the required sample size more than changing from 0.1 to 0.09.

Note that the asymptotic normality that the Z-test (and therefore the sample size calculations) is based on doesn't require the underlying data to be normally distributed. Instead, it's the difference-in-means estimator that needs to be approximately normally distributed under the null hypothesis, which it is for many underlying data distributions thanks to the central limit theorem. Learn more about the distribution of the difference-in-means estimator in the Hypothesis Testing course.

The plot below shows how the z-value changes with alpha.

The power z-value

The same relation holds for how power comes into the sample size formula

Lesson 2: A Refresher on Alpha and Power

Possible outcomes in experiments

Video recap

Win rate across all experiments

The nonlinearity of alpha and power

The Alpha z-value

The power z-value

What is the primary goal of experimentation in terms of risks?

What does a statistically valid test with a certain alpha guarantee?

What happens in experiments where the true effect is smaller than the MDE?

What changes the required sample size the most (in absolute numbers), increasing alpha with one unit or decreasing it with one unit?

Lesson 2: A Refresher on Alpha and Power

Possible outcomes in experiments

Video recap

Win rate across all experiments

The nonlinearity of alpha and power

The Alpha z-value

The power z-value

What is the primary goal of experimentation in terms of risks?

What does a statistically valid test with a certain alpha guarantee?

What happens in experiments where the true effect is smaller than the MDE?

What changes the required sample size the most (in absolute numbers), increasing alpha with one unit or decreasing it with one unit?