Who is the Confidence Bootcamp for?

The bootcamp is designed for anyone who wants to improve their experimentation skills. Courses are tailored for data scientists, analysts, engineers, product managers, and leaders — whether you are running your first A/B test or scaling an experimentation program across your organization.

Is the bootcamp free?

Yes, the Confidence Bootcamp is completely free. All 11 courses, 90+ lessons, and resources are available at no cost. You can start learning immediately without creating an account, though signing in lets you track your progress across devices.

The bootcamp covers the full experimentation lifecycle: A/B testing fundamentals, hypothesis formulation, interpreting experiment results, metrics design, sample size calculation, feature flags, and building an experimentation culture. It includes 11 courses with over 90 lessons built by the Confidence team at Spotify.

How long does the bootcamp take to complete?

The full bootcamp takes approximately 20 hours to complete across all 11 courses. Individual courses range from 30 minutes to 3 hours. You can learn at your own pace and pick the courses most relevant to your role.

Do I need prior experience with A/B testing or statistics?

No prior experience is required. The bootcamp starts with foundational courses like Intro to Experimentation and progressively covers more advanced topics like sequential testing and variance reduction. Each course clearly indicates which roles it is designed for.

Who created the Confidence Bootcamp?

The Confidence Bootcamp was created by the Confidence team at Spotify, the same team that builds the experimentation and feature flagging platform used across Spotify. The content reflects real-world experimentation practices used at one of the world's largest digital products.

Lesson 5: False positive rate and alpha

Summary

This page teaches you about the concept of false positive results and the rate at which they appear in experiments. You learn:

What a false positive result is
What the false positive rate is
What alpha is and how it relates to the false positive rate

A false positive result

A false positive result, often simply called a "false positive," is when we find a statistically significant effect from a treatment in an experiment when the treatment actually doesn't have an effect. Another term for a false positive result is "a type I error".

In most experiments, we are testing the mean difference between the treatment groups. A false positive result in this case would be when we observe a large enough mean difference between treatment and control groups to be statistically significant even though the treatment had no effect on the outcome.

All users have some value on the outcome metric even if the treatment has no effect. Some have large values relative to the population mean, some have small values relative to the population mean. When we are randomly splitting the users into treatment and control, there is always a risk that most users in the sample with large values end up in the treatment group rather than in the control group.

User assignment simulator

Randomly assign users to treatment or control groups with balanced allocation (4 users per group). The treatment has no effect, so any significant mean difference is a false positive.

Alice (10)

Bob (15)

Charlie (5)

David (20)

Eve (8)

Frank (12)

Grace (18)

Henry (6)

The decision from one experiment will be right or wrong

Although the random treatment assignment makes it possible to quantify the variation, it also means that we can never be certain about whether an observed result in an experiment is true or not.

There is no way around the fact that we will never be certain about the result of a single experiment. However, we can ensure that the rate of false positives across many experiments is bounded at a certain level.
Statistical tests are constructed to limit the rate of finding the wrong result across many experiments.

False positive rate

Valid statistical tests quantify the variability of the test statistic under the hypothesis of no treatment effect and use that to bound the rate at which we get false positive results. Only the alpha percent most unlikely imbalances under the null will be considered significant.

A good property for a statistical test to have is that the rate of false positives is bounded to a certain level which can be controlled by the experimenter.

Note

The false positive rate is the rate at which we find false positives in experiments where there is no effect.
For example, if we run 100 experiments where the treatment has no effect on the outcome metric, and we find that 10 have a significant effect, the proportion of experiments where we find a significant effect (10/100 = 10%) is the false positive rate of this test.

Alpha (the intended false positive rate)

Alpha is a parameter that statistical tests have that corresponds to the intended upper bound on the false positive rate. We say that a statistical test is valid if the false positive rate over repeated experiments (with no effect) is lower than or equal to alpha. In other words, by using valid statistical tests, we can bound the proportion of experiments where there is no true effect but we falsely find one.

At this point, you might be wondering why we cannot simply set alpha to zero to avoid all false positives. The reason is that we also want to be able to find true effects when they are there, and unfortunately, there is a trade-off between the false positive rate and our ability to find true effects which we return to in the next lesson.

False positive rate simulator

Now we can put what we have learned together and simulate experiments to see how often we find false positives with a given alpha.
In this simulator, we are using a Z-test and are drawing large random samples. Since the sample size is large, the distribution of the test statistic under the null hypothesis is approximately normal and therefore the false positive rate should be close to alpha across many random experiments.

False positive rate simulator

Alpha (intended false positive rate):

Adjust alpha and click simulate to see results.

If we ran the simulation with infinitely many experiments, the rate converges on exactly alpha%.

Notes for Nerds

Conservative tests

The test used in the simulation would reach exactly the intended false positive rate if we simulated a large enough number of experiments. However, for a test to be valid, it's enough that the false positive rate is lower than or equal to alpha.

A statistical test that has a false positive rate substantially lower than alpha is called a conservative test. Generally speaking, it is good to avoid conservative tests as they give the experimenter less control over the risk management of the experiments.

Intended vs actual false positive rate

Note that we say "intended" false positive rate. If we use a statistical test incorrectly, the actual false positive rate might not in fact be bounded by alpha.
A classic example of when a test is misused causing inflated false positive rates is when a fixed-sample hypothesis test is used to peek at the data multiple times. In this case, the false positive rate is not bounded by the alpha of the test, because the test only bounds the false positive below alpha if the test is performed once at the end of the experiment, not if it is performed repeatedly.

Read more about the issue with peeking on standard statistical hypothesis tests in the blog post on sequential tests.

Lesson 5: False positive rate and alpha

A false positive result

User assignment simulator

The decision from one experiment will be right or wrong

False positive rate

Alpha (the intended false positive rate)

False positive rate simulator

False positive rate simulator

What is a false positive result in an experiment?

What does the false positive rate represent?

What is the role of alpha in hypothesis testing?

Notes for Nerds

Conservative tests

Intended vs actual false positive rate

Lesson 5: False positive rate and alpha

A false positive result

User assignment simulator

The decision from one experiment will be right or wrong

False positive rate

Alpha (the intended false positive rate)

False positive rate simulator

False positive rate simulator

What is a false positive result in an experiment?

What does the false positive rate represent?

What is the role of alpha in hypothesis testing?

Notes for Nerds

Conservative tests

Intended vs actual false positive rate