Who is the Confidence Bootcamp for?

The bootcamp is designed for anyone who wants to improve their experimentation skills. Courses are tailored for data scientists, analysts, engineers, product managers, and leaders — whether you are running your first A/B test or scaling an experimentation program across your organization.

Is the bootcamp free?

Yes, the Confidence Bootcamp is completely free. All 11 courses, 90+ lessons, and resources are available at no cost. You can start learning immediately without creating an account, though signing in lets you track your progress across devices.

The bootcamp covers the full experimentation lifecycle: A/B testing fundamentals, hypothesis formulation, interpreting experiment results, metrics design, sample size calculation, feature flags, and building an experimentation culture. It includes 11 courses with over 90 lessons built by the Confidence team at Spotify.

How long does the bootcamp take to complete?

The full bootcamp takes approximately 20 hours to complete across all 11 courses. Individual courses range from 30 minutes to 3 hours. You can learn at your own pace and pick the courses most relevant to your role.

Do I need prior experience with A/B testing or statistics?

No prior experience is required. The bootcamp starts with foundational courses like Intro to Experimentation and progressively covers more advanced topics like sequential testing and variance reduction. Each course clearly indicates which roles it is designed for.

Who created the Confidence Bootcamp?

The Confidence Bootcamp was created by the Confidence team at Spotify, the same team that builds the experimentation and feature flagging platform used across Spotify. The content reflects real-world experimentation practices used at one of the world's largest digital products.

Lesson 8: Variance reduction in experiment results

Summary

In this lesson, you learn what variance reduction is and why it causes the means shown in Confidence to differ slightly from raw group averages. The treatment variant effect is interpreted in the same way as without variance reduction. The adjustment simply makes the estimate more precise.

When you look at the results for a metric in Confidence, the control variant mean and treatment variant mean shown may not be exactly the same as the raw averages for those groups. This is because variance reduction is active by default for most metrics. Understanding what this adjustment does and what it does not do is important for reading results correctly.

The core idea: use pre-experiment data to reduce noise

Every metric has natural variation. Some users will use a feature a lot; others will barely touch it. Much of this variation has nothing to do with the experiment. It reflects pre-existing differences between users that existed before the experiment started.

Variance reduction works by using each user's pre-experiment behavior on the metric to predict and cancel out this pre-existing variation. Specifically, Confidence looks at how each user behaved on the metric before they entered the experiment, and uses that data to produce a more precise estimate of the treatment variant effect.

Think of it this way: if you know that certain users were already heavy users before the experiment, you can account for that when estimating whether the treatment variant changed their behavior. Without this adjustment, that pre-existing variation adds noise to your estimate. With it, much of that noise is removed.

Note

The pre-experiment data used for variance reduction must come from before the user entered the experiment, so it cannot be influenced by the treatment variant. This is what makes the adjustment valid.

What changes and what does not

Because of the adjustment, the control variant and treatment variant means shown in Confidence are not the raw group averages. They are slightly shifted versions that have been adjusted to account for pre-existing differences between groups.

However, the estimated treatment variant effect and the way you interpret it remain the same. The point estimate (the relative % change) and the confidence interval are still your best estimate of the observed treatment variant effect. The adjustment makes them more precise, not different in meaning.

The relative change is calculated as the adjusted treatment mean minus the adjusted control mean, divided by the unadjusted control variant mean, so the percentage you see is still relative to the actual (unadjusted) control baseline.

Example

A metric shows a control variant mean of 195.8 and a treatment variant mean of 196.5 in Confidence. These are adjusted values. The raw group averages might have been 196.1 and 196.7. The relative change (+0.36%) and the confidence interval are computed from the adjusted values and represent a more precise estimate of the observed treatment variant effect than you would get from the raw averages alone.

The variance reduction percentage

The variance reduction for a metric tells you how much of the original variance was removed by the adjustment. A variance reduction of 60% means that the adjusted estimate has 60% less variance than the raw estimate, effectively similar to having 2.5 times as many users without the adjustment.

A high variance reduction means the pre-experiment data was strongly predictive of post-experiment behavior. A variance reduction of 0% means no adjustment was applied.

In Confidence

In Confidence, you can see the variance reduction percentage for each metric in the Detailed results view.

Use the interactive below to see how variance reduction narrows the confidence interval compared to no adjustment, for the same sample size and metric noise.

Variance reduction and CI width

The top bar shows the CI without variance reduction; the bottom bar shows it after variance reduction. Both are centred on the same observed treatment effect. Use the direction toggle to set which way the metric should move.

Metric improves when it:

+4.2%

Not significant

+4.2%

Not significant

Point estimate: +4.2%

-15%+15%

Sample size per group: 300 users

10010,000

Metric standard deviation (σ): 50 units

10 (low noise)100 (high noise)

Variance reduction: 0%

0% (disabled)80%

Without VR

With high confidence, the true effect is between -1.5% and +9.9%. Since zero is in the interval, we cannot conclude whether the treatment improved or worsened this metric.

The result for this metric is inconclusive — collect more data, or if you have reached the required sample size, end the experiment.

With 0% VR

With high confidence, the true effect is between -1.5% and +9.9%. Since zero is in the interval, we cannot conclude whether the treatment improved or worsened this metric.

The result for this metric is inconclusive — collect more data, or if you have reached the required sample size, end the experiment.

Try the following:

Set variance reduction to 0%. This is the CI you would get without any adjustment.
Increase variance reduction to 60%. Notice how the CI narrows: that is extra precision from pre-experiment data, with no additional users.
Now try increasing the sample size instead. Both approaches narrow the CI; variance reduction is the free version.

Why this matters

Variance reduction is one of the main reasons sample sizes in Confidence can be smaller than in tools that do not use this technique. It improves the precision of every estimate without requiring more users. When you see a narrow confidence interval for a metric, variance reduction is often a contributing factor.

You do not need to think about variance reduction when reading results. Just know that the adjustment is there, it makes estimates more reliable, and you interpret the numbers the same way you would without it.

Reader exercise

When variance reduction is active, the means shown for a metric in Confidence are...

The raw group averages, unchanged

Adjusted values that account for pre-experiment differences between users, producing a more precise estimate

Predictions from a machine learning model

Averages computed only from users who were highly active before the experiment

Reader exercise

How should you interpret the relative % change for a metric when variance reduction is active?

You should apply a correction factor before interpreting it

You cannot interpret it directly. You need to look at the unadjusted raw means instead

In the same way as without variance reduction. It is the estimated treatment variant effect, just measured more precisely

As a lower bound on the observed treatment variant effect, since variance reduction tends to underestimate effects

Notes for nerds

The regression adjustment

The variance reduction method used in Confidence is covered in detail in the variance reduction lesson in the intro to metrics course, and its effect on required sample sizes is covered in the sample size calculation III course. In short, the method fits separate regressions of the post-experiment outcome on the pre-experiment variable for each group, then adjusts the treatment variant effect estimate accordingly. The classical CUPED formulation uses a single adjustment coefficient for both groups. Fitting separate regressions per group is never worse, and is strictly better whenever users respond differently to treatment—which is the typical case. The two are equivalent only in a perfectly balanced 50/50 experiment (Negi and Wooldridge, 2020).

Bounds on covariate selection

In principle, any pre-experiment covariate can be included in the regression to reduce variance further. In practice, the pre-experiment measurement of the metric itself is hard to beat—and the gains from going beyond it are bounded. Even the most sophisticated feature engineering can narrow the confidence interval by at most a further 29% beyond what the simple pre-experiment metric already achieves (Ting and Hung, 2023).

Adjusted control means in multi-variant experiments

One consequence is that in experiments with multiple treatment variants, the variance-adjusted control variant mean shown for a given comparison uses only the data from the variants involved in that specific comparison. This means the adjusted control variant mean can differ slightly between comparisons. This is expected and correct. It does not indicate an error in the data.

Lesson 8: Variance reduction in experiment results

Summary

The core idea: use pre-experiment data to reduce noise

Note

What changes and what does not

Example

The variance reduction percentage

A high variance reduction means the pre-experiment data was strongly predictive of post-experiment behavior. A variance reduction of 0% means no adjustment was applied.

In Confidence

In Confidence, you can see the variance reduction percentage for each metric in the Detailed results view.

Use the interactive below to see how variance reduction narrows the confidence interval compared to no adjustment, for the same sample size and metric noise.

Variance reduction and CI width

Metric improves when it:

+4.2%

Not significant

+4.2%

Not significant

Point estimate: +4.2%

-15%+15%

Sample size per group: 300 users

10010,000

Metric standard deviation (σ): 50 units

10 (low noise)100 (high noise)

Variance reduction: 0%

0% (disabled)80%

Without VR

With high confidence, the true effect is between -1.5% and +9.9%. Since zero is in the interval, we cannot conclude whether the treatment improved or worsened this metric.

The result for this metric is inconclusive — collect more data, or if you have reached the required sample size, end the experiment.

With 0% VR

With high confidence, the true effect is between -1.5% and +9.9%. Since zero is in the interval, we cannot conclude whether the treatment improved or worsened this metric.

The result for this metric is inconclusive — collect more data, or if you have reached the required sample size, end the experiment.

Try the following:

Set variance reduction to 0%. This is the CI you would get without any adjustment.
Increase variance reduction to 60%. Notice how the CI narrows: that is extra precision from pre-experiment data, with no additional users.
Now try increasing the sample size instead. Both approaches narrow the CI; variance reduction is the free version.

Why this matters

Reader exercise

When variance reduction is active, the means shown for a metric in Confidence are...

The raw group averages, unchanged

Adjusted values that account for pre-experiment differences between users, producing a more precise estimate

Predictions from a machine learning model

Averages computed only from users who were highly active before the experiment

Reader exercise

How should you interpret the relative % change for a metric when variance reduction is active?

You should apply a correction factor before interpreting it

You cannot interpret it directly. You need to look at the unadjusted raw means instead

In the same way as without variance reduction. It is the estimated treatment variant effect, just measured more precisely

As a lower bound on the observed treatment variant effect, since variance reduction tends to underestimate effects