Who is the Confidence Bootcamp for?

The bootcamp is designed for anyone who wants to improve their experimentation skills. Courses are tailored for data scientists, analysts, engineers, product managers, and leaders — whether you are running your first A/B test or scaling an experimentation program across your organization.

Is the bootcamp free?

Yes, the Confidence Bootcamp is completely free. All 11 courses, 90+ lessons, and resources are available at no cost. You can start learning immediately without creating an account, though signing in lets you track your progress across devices.

The bootcamp covers the full experimentation lifecycle: A/B testing fundamentals, hypothesis formulation, interpreting experiment results, metrics design, sample size calculation, feature flags, and building an experimentation culture. It includes 11 courses with over 90 lessons built by the Confidence team at Spotify.

How long does the bootcamp take to complete?

The full bootcamp takes approximately 20 hours to complete across all 11 courses. Individual courses range from 30 minutes to 3 hours. You can learn at your own pace and pick the courses most relevant to your role.

Do I need prior experience with A/B testing or statistics?

No prior experience is required. The bootcamp starts with foundational courses like Intro to Experimentation and progressively covers more advanced topics like sequential testing and variance reduction. Each course clearly indicates which roles it is designed for.

Who created the Confidence Bootcamp?

The Confidence Bootcamp was created by the Confidence team at Spotify, the same team that builds the experimentation and feature flagging platform used across Spotify. The content reflects real-world experimentation practices used at one of the world's largest digital products.

Lesson 3: Why you need randomized controlled experiments

Summary

In this lesson, you learn about the role of randomization in experiments. Randomized experiments are also known as randomized controlled trials.

Randomized treatment assignment:

Makes treatment groups similar in all aspects besides which treatment they receive.
Makes it possible to interpret the treatment effect as the causal effect of the treatment on the outcome.

Example: The effectiveness of a weight-loss program

Person standing on a bathroom scale, illustrating the weight-loss program experiment example

Let's imagine that you own a gym, and you want to offer a weight-loss program to your customers. You want to know how effective your program is, so you design an experiment.

Experiment design 1: Uncontrolled trial

You stand at the entrance of your gym and look for volunteers to participate in your program so you can measure its effectiveness. You weight the people that want to join your program before they enroll. After a 6-week program, consisting of exercise schedules and dietary advice, you weigh them again. You calculate the average difference in weight before and after the program, and find that people lost 3 kg on average. You celebrate a great success and start advertising your program!

The problem with an uncontrolled trial

You don't know what would have happened if people didn't enroll in your program. All participants wanted to lose weight, and maybe they would have done so without your program. People's weight fluctuates over time. People who had just gained some weight (for example after holidays) may be more motivated to sign-up. They may also just lose weight again just because they returned to their normal lifestyle. If you want to know the effectiveness of your program, then you need to compare your program with a situation without it.

Experiment design 2: You need a control group

You stand at the entrance of your gym and look for volunteers to participate in your program so you can measure its effectiveness. You enroll the people that want to join. You ask the people that don't want to participate to be part of a control group. You weigh both groups before and after the program. After the program, you calculate the change in weight before and after for each group, and then calculate the difference between both groups. You find that our treatment group lost more weight than the control group! You celebrate a great success and start advertising our program!

The problem with observational control groups

Our treatment and control groups are not comparable. The treatment group wanted to lose weight, and the control group didn't. The control group may contain people who joined the gym to become stronger, and may have even gained weight from growing muscles! This mechanism is called selection bias, and happens when groups are selected in a way that biases the result. Selection bias causes incomparable groups and invalidates any result. To avoid selection bias, you need a method of assigning people to the treatment and control groups, that can't have any correlation with the outcome that you plan to measure.

Cartoon illustrating sampling bias: a presenter shows survey results where 99.8% say they love responding to surveys, captioned "We received 500 responses and found that people love responding to surveys"

Experiment design 3: Randomized controlled trial

You want to get a precise estimate of the effectiveness of your weight loss program. For this, you need to compare people who took the program to a control group that is comparable in all other relevant aspects, except for the fact that they have taken the program. For this example, you could do the following instead. You again stand at the entrance of your gym and ask people if they are interested in participating in your weight-loss program. If they say "no", then they don't participate in the trial. If they say "yes", you weigh them and then flip a coin. Based on the coin flip you either enroll them right away, or you tell them "The program starts in 6 weeks, come back then!"". This way you remove any selection bias. The random assignment makes sure that on average, the groups are similar across all other characteristics except for the treatment that you give them. Of course, you still need to make sure that you can collect the data from everybody in the treatment and the control group after 6 weeks!

Randomized controlled trials

Experiments, like A/B tests and rollouts, split users into two (or more) groups by random assignment. The random assignment makes sure that the groups are, on average, similar in all aspects except for the change you want to test. For example, if you randomly split all Spotify users into two groups, the two groups should be very similar in terms of dimensions like demographics, connection speed, and music taste. One group gets the status of a "treatment" group and receives the new feature. The other group receives the default feature. you can then observe the users over time while they receive two different experiences, and measure some outcome of interest, for example churn, daily activity, or the number of minutes played. At the end of the experiment, you run a statistical test to calculate whether the differences between the groups are larger than what you expect to see if there's no difference.

Reader exercise

What is the purpose of randomizing the treatment assignment in a controlled experiment?

To ensure that the results are significant.

To limit the time effects in the analysis.

To make the groups comparable in all other aspects than which treatment they received.