Who is the Confidence Bootcamp for?

The bootcamp is designed for anyone who wants to improve their experimentation skills. Courses are tailored for data scientists, analysts, engineers, product managers, and leaders — whether you are running your first A/B test or scaling an experimentation program across your organization.

Is the bootcamp free?

Yes, the Confidence Bootcamp is completely free. All 11 courses, 90+ lessons, and resources are available at no cost. You can start learning immediately without creating an account, though signing in lets you track your progress across devices.

The bootcamp covers the full experimentation lifecycle: A/B testing fundamentals, hypothesis formulation, interpreting experiment results, metrics design, sample size calculation, feature flags, and building an experimentation culture. It includes 11 courses with over 90 lessons built by the Confidence team at Spotify.

How long does the bootcamp take to complete?

The full bootcamp takes approximately 20 hours to complete across all 11 courses. Individual courses range from 30 minutes to 3 hours. You can learn at your own pace and pick the courses most relevant to your role.

Do I need prior experience with A/B testing or statistics?

No prior experience is required. The bootcamp starts with foundational courses like Intro to Experimentation and progressively covers more advanced topics like sequential testing and variance reduction. Each course clearly indicates which roles it is designed for.

Who created the Confidence Bootcamp?

The Confidence Bootcamp was created by the Confidence team at Spotify, the same team that builds the experimentation and feature flagging platform used across Spotify. The content reflects real-world experimentation practices used at one of the world's largest digital products.

Lesson 3: How to measure impact with success and guardrail metrics

Summary

Use success metrics to capture what you want to improve. Use guardrail metrics to capture what you don't want to affect negatively.

When you run an experiment, such as an A/B test or a rollout, the ultimate goal is to learn about the impact of the change you made. To know what the impact is, you need to measure the outcome on a relevant set of metrics. The metrics you select can serve different purposes, and even be subject to different statistical tests. This page describes the two main types of metrics you can use to measure impact, and how to select them.

The two types of metrics are:

Success metrics. Metrics that you aim to improve with your change.
Guardrail metrics. Metrics that you don't expect to improve, but that you want to make sure you don't have a negative impact on.

Success and guardrail metrics

Success metrics are the metrics that you aim to improve with your change. They're what you use to prove that your change had a positive impact.

In companion to success metrics, you should also select guardrail metrics. Guardrail metrics are metrics that help you make sure that your change doesn't have a negative impact on other aspects of your product. This means a hypothesis for an experiment includes two criteria: one for the success metric and one for the guardrail metric. Both criteria need evidence to support the decision to launch the change.

Let's look at some examples of success and guardrail metrics.

Example: Checkout flow

You run an A/B test with an improvement to the checkout flow of your e-commerce website. Your goal is to make the checkout flow more efficient so that your visitors spend less time in the checkout flow. You want to make sure that the improvement in the checkout flow doesn't come at the expense of the number of purchases.

Success metric: Average time to completed checkout per visitor.
Guardrail metric: Number of purchases per visitor.

Example: Search algorithm

With your new Spotify search algorithm, your hypothesis is that users get better podcast recommendations. You want to measure the impact of the new algorithm on the consumption of podcasts. You want to make sure that the new algorithm change doesn't reduce the number of users that listen to music.

Success metric: The average number of podcast minutes played per user.
Guardrail metric : The average number of music minutes played per user.

Example: Dating app

You have a dating app that requires new users to complete their profile before they can interact with others. You run a test where your hypothesis is that, showing a dialog with advice for how to onboard, increases the number of users that complete their setup. The dialog shown in your dating app experiment uses new technology, and you want to make sure you don't introduce any bugs.

Success metric: Share of users that complete their profile setup.
Guardrail metric: Number of crashes per user.

Recommendation

When you start out with experimentation, it is a good idea just to select some guardrail metrics and start experimenting. After you got the hang of it, you can make guardrail metrics even more valuable to your decision making by specifying so-called non-inferior margins (NIM). Learn more about the tests used for guardrail metrics in this Advance your experimentation course lesson.

Example

You are trying to increase the engagement in your product with a new variant that is aiming to increase the engagement in a certain view, say A. To ensure that an increase in engagement in view A doesn't come at the expense of engagement in a competing view B. You should use the engagement in view A as the success metric and the engagement in view B as the guardrail metric. If the new variant increases the engagement in view A and doesn't decrease the engagement in view B, the variant is successful and should be shipped.

How success and guardrail metrics together define a successful variant

For a change to be worth shipping, at least one success metric must have improved significantly, while all guardrail metrics must not have regressed beyond acceptable limits. Both conditions must hold: a win on the success metric doesn't override a failure on a guardrail.

In Confidence

Confidence uses both success and guardrail metrics to identify a successful variant in A/B tests. Read more about how success and guardrail metrics feed into the overall recommendation for a decision.

Reader exercise

What is a guardrail metric used for?

It is used to guard that the success metric is significantly superior.

It is used to prove that a metric has improved significantly as a consequence of the treatment.

Guardrail metrics are used to find evidence for that a metric has not been negatively impacted by the treatment more than a certain amount.

Reader exercise

What is a correct statement?

Choose guardrail metrics or success metrics, never both.

Success metrics are better than guardrail metrics for most experiments. But in some cases, guardrail metrics are more important.

Success metrics and guardrail metrics capture different aspects of overall success, and usually both should be used.

Lesson 3: How to measure impact with success and guardrail metrics

Summary

Use success metrics to capture what you want to improve. Use guardrail metrics to capture what you don't want to affect negatively.

The two types of metrics are:

Success metrics. Metrics that you aim to improve with your change.
Guardrail metrics. Metrics that you don't expect to improve, but that you want to make sure you don't have a negative impact on.

Success and guardrail metrics

Success metrics are the metrics that you aim to improve with your change. They're what you use to prove that your change had a positive impact.

Let's look at some examples of success and guardrail metrics.

Example: Checkout flow

Success metric: Average time to completed checkout per visitor.
Guardrail metric: Number of purchases per visitor.

Example: Search algorithm

Success metric: The average number of podcast minutes played per user.
Guardrail metric : The average number of music minutes played per user.

Example: Dating app

Success metric: Share of users that complete their profile setup.
Guardrail metric: Number of crashes per user.

Recommendation

Example

How success and guardrail metrics together define a successful variant

In Confidence

Reader exercise

What is a guardrail metric used for?

It is used to guard that the success metric is significantly superior.

It is used to prove that a metric has improved significantly as a consequence of the treatment.

Guardrail metrics are used to find evidence for that a metric has not been negatively impacted by the treatment more than a certain amount.

Reader exercise

What is a correct statement?

Choose guardrail metrics or success metrics, never both.

Success metrics are better than guardrail metrics for most experiments. But in some cases, guardrail metrics are more important.

Success metrics and guardrail metrics capture different aspects of overall success, and usually both should be used.