Who is the Confidence Bootcamp for?

The bootcamp is designed for anyone who wants to improve their experimentation skills. Courses are tailored for data scientists, analysts, engineers, product managers, and leaders — whether you are running your first A/B test or scaling an experimentation program across your organization.

Is the bootcamp free?

Yes, the Confidence Bootcamp is completely free. All 11 courses, 90+ lessons, and resources are available at no cost. You can start learning immediately without creating an account, though signing in lets you track your progress across devices.

The bootcamp covers the full experimentation lifecycle: A/B testing fundamentals, hypothesis formulation, interpreting experiment results, metrics design, sample size calculation, feature flags, and building an experimentation culture. It includes 11 courses with over 90 lessons built by the Confidence team at Spotify.

How long does the bootcamp take to complete?

The full bootcamp takes approximately 20 hours to complete across all 11 courses. Individual courses range from 30 minutes to 3 hours. You can learn at your own pace and pick the courses most relevant to your role.

Do I need prior experience with A/B testing or statistics?

No prior experience is required. The bootcamp starts with foundational courses like Intro to Experimentation and progressively covers more advanced topics like sequential testing and variance reduction. Each course clearly indicates which roles it is designed for.

Who created the Confidence Bootcamp?

The Confidence Bootcamp was created by the Confidence team at Spotify, the same team that builds the experimentation and feature flagging platform used across Spotify. The content reflects real-world experimentation practices used at one of the world's largest digital products.

Lesson 12: A/B tests and rollouts

Summary

Use A/B tests to identify the winning variant, use rollouts to ship the winner. For technical changes, like major refactors and migrations, use rollouts to avoid risk and to be able to roll back instantly.

A/B tests and rollouts are two tools for product evaluation. Although similar in some ways, they are usually used for different stages of evaluation.

A/B tests

The main characteristics of A/B tests are that they

Can have more than two variants
Can have both success metrics and guardrail metrics
Have a fixed allocation of the total population
Allow for different evaluation frequencies for calculating results

Use A/B tests to

Decide a winner among two or more variants of a product
Explore and learn about how different settings affect use behavior

A/B tests are flexible and rich product evaluation tools. They help you ensure that the winning variant is better than the losing variants for the business, by allowing you to consider a complete set of success and guardrail metrics.

Rollouts

The main characteristics of a rollout is that it

Can only have two variants of which one is the current default and one is the variant that you want to roll out
Can only have guardrail metrics
Has an allocation that you can gradually increase
Always displays results continuously

Use rollouts to

Gradually ship a variant while monitoring important guardrail metrics
Gradually ship technical changes to the system, for example major refactors and migrations

A great benefit of using rollouts to ship changes is that the change that you roll out is behind a feature flag. This makes it easy to roll back a change. In other words, if you start rolling something out and get an alert that the rollout harms the end-user experience, it is only a button click away to revert back to the earlier experience. This saves engineers a lot of time and agony at Spotify, and has made rollouts the default way for engineers to release changes.

Combine A/B tests and rollouts for important changes

Significant results from experiments with small sample sizes tend to over estimate the treatment effect. Some online experimenters propose that you should replicate the results from such experiments by rerunning the experiment on other users to confirm the result. In practice, it is hard to know which experiments are underpowered. One way to think about it is that you should scrutinize unexpected results harder, and replicate them, to believe in them. A practical way to get confirmation of results from an A/B test is to ship the winner with a rollout. The metric results in the rollout works as a replication of the A/B test results and you can be even more certain about making the right decision.

At Spotify, most A/B tests that identify a winning variant use a rollout to ship that variant, which means that most results are replicated while released to everyone.

In Confidence

In Confidence, A/B tests and rollouts are the two main experiment types. Rollouts are behind feature flags, making it easy to revert with a single click if something goes wrong. Read more about A/B tests and rollouts in the documentation.

Reader exercise

You should choose a rollout instead of an A/B test when you want to:

Find a winner among two or more variants of a product.

Gradually ship technical changes to the system, for example major refactorings and migrations.

Explore and learn about how different settings affect use behavior.