Who is the Confidence Bootcamp for?

The bootcamp is designed for anyone who wants to improve their experimentation skills. Courses are tailored for data scientists, analysts, engineers, product managers, and leaders — whether you are running your first A/B test or scaling an experimentation program across your organization.

Is the bootcamp free?

Yes, the Confidence Bootcamp is completely free. All 11 courses, 90+ lessons, and resources are available at no cost. You can start learning immediately without creating an account, though signing in lets you track your progress across devices.

The bootcamp covers the full experimentation lifecycle: A/B testing fundamentals, hypothesis formulation, interpreting experiment results, metrics design, sample size calculation, feature flags, and building an experimentation culture. It includes 11 courses with over 90 lessons built by the Confidence team at Spotify.

How long does the bootcamp take to complete?

The full bootcamp takes approximately 20 hours to complete across all 11 courses. Individual courses range from 30 minutes to 3 hours. You can learn at your own pace and pick the courses most relevant to your role.

Do I need prior experience with A/B testing or statistics?

No prior experience is required. The bootcamp starts with foundational courses like Intro to Experimentation and progressively covers more advanced topics like sequential testing and variance reduction. Each course clearly indicates which roles it is designed for.

Who created the Confidence Bootcamp?

The Confidence Bootcamp was created by the Confidence team at Spotify, the same team that builds the experimentation and feature flagging platform used across Spotify. The content reflects real-world experimentation practices used at one of the world's largest digital products.

Lesson 1: Why you should experiment

Summary

In this lesson, you learn when you should experiment. Run experiments to:

Objectively test your own biased assumptions
Avoid accidentally causing breakage while trying to improve
Innovate fast by abandoning bad ideas early
Establish a causal link between a product change and an outcome

We experiment because we know that we have biases

As humans, we tend to look for evidence that supports what we already believe, a phenomenon known as confirmation bias. To make matters worse, we also have a tendency to overvalue the products that we built ourselves (also known as the IKEA effect). This means that if we want to know the true value of product changes for our users, we have to be very careful to measure the impact in an unbiased and objective way, to avoid having our own beliefs fool us.

We experiment to avoid accidental breakage

Every time we change something about our product, we run the risk of accidentally causing negative side effects. This could be an increase in latency or crash rates caused by a new feature. For a mature product such as Spotify, it is much easier to unintentionally break the user experience than to improve it. Without experimentation, small undetected decreases in performance can add up and have a detrimental combined impact on the overall user experience. Product evaluation

We run experiments to innovate fast and abandon bad ideas early

The most important thing for most companies, Spotify included, is not to ship a lot of changes, but to ship the right changes. To not release negative product changes is as important as to release new positive changes to the product. Without testing our assumptions systematically on real users in a real life setting, we risk investing a lot of development resources into product changes that appeared promising at first, but didn't actually improve the user experience in a real life setting.

Experiments allow us to draw causal conclusions

Let's say that we are looking for ways to reduce churn for Spotify premium users. We could do an analysis that compares users who churned with users who didn't. One result of such an analysis could be that users who didn't churn experienced more app crashes than users who churned. Does this mean that increasing the number of app crashes would reduce churn? Of course not. People who use the app a lot are more likely to experience a crash, and are also less likely to churn.

Now let's imagine that we built a new feature, and we hope that it reduces churn for Spotify premium users. In theory, we could just roll out the feature to everyone, check how many people are using it, and then see if people who use the feature are less likely to churn. But would this tell us if the feature actually reduces churn? No. Because just as with app crashes, a correlation between more feature usage and less churn would not imply a causal link. To objectively measure the value of our new feature, we need to find a way to isolate the impact of the feature from everything else that can impact our metric of choice. The gold standard method for doing this is called a "randomized controlled trial."

Randomized controlled trials

Experiments split users into two (or more) groups by random assignment. The random assignment makes sure that the groups are, on average, similar in all aspects except for the change we want to test. If we randomly split all Spotify users into two groups, the two groups should be very similar in terms of dimensions like demographics, connection speed, and music taste. One group gets the status of a "treatment" group and receives the new feature. The other group receives the default feature. We can then observe the users over time while they receive two different experiences, and measure some outcome of interest, for example churn, daily activity, or the number of minutes played. At the end of the experiment, we run a statistical test to calculate whether the differences between the groups are larger than what we expect to see if there's no difference.

The cost of experiments

Experiments aren't free. The main costs involved are that:

It takes time to set up an experiment, wait for users to be exposed and analyze the results.
If the change that you test is as beneficial as you hope, then the users in the control group miss out on the improved experience until the end of the experiment.
If a change makes the user experience worse, then some users receive a worse experience for as long as the experiment runs.

The cost of not running experiments

You don't know if users respond to the product change in the way that you expect.
You might have negatively impacted your users in unexpected ways. If you roll out many changes without A/B testing them on real users, there might be negative impacts on system performance, crash rates, and more that you fail to detect. Taken together, they can add up and seriously impact the user experience.
Without testing your assumptions on real users, you risk investing resources into product changes that appear promising, but don't actually improve the experience in a real life setting.

Learn more

Watch this video to see examples of different types of experiments for various common use cases.

Reader exercise

Why should all changes that affect end-users be tested with A/B tests and/or rollouts?

To ensure that our changes have the effects we intended and detect unexpected side effects that might harm our end users and thereby our business.

To inform other parts of the company what we are working on to ensure transparency.

Experimentation is important in itself, because there could be no learning without it.