Lesson 8: Variance reduction

When you look at the results for a metric in Confidence, the control variant mean and treatment variant mean shown may not be exactly the same as the raw averages for those groups. This is because variance reduction is active by default for most metrics. Understanding what this adjustment does and what it does not do is important for reading results correctly.

The core idea: use pre-experiment data to reduce noise

Every metric has natural variation. Some users will use a feature a lot; others will barely touch it. Much of this variation has nothing to do with the experiment. It reflects pre-existing differences between users that existed before the experiment started.

Variance reduction works by using each user's pre-experiment behavior on the metric to predict and cancel out this pre-existing variation. Specifically, Confidence looks at how each user behaved on the metric before they entered the experiment, and uses that data to produce a more precise estimate of the treatment variant effect.

Think of it this way: if you know that certain users were already heavy users before the experiment, you can account for that when estimating whether the treatment variant changed their behavior. Without this adjustment, that pre-existing variation adds noise to your estimate. With it, much of that noise is removed.

What changes and what does not

Because of the adjustment, the control variant and treatment variant means shown in Confidence are not the raw group averages. They are slightly shifted versions that have been adjusted to account for pre-existing differences between groups.

However, the estimated treatment variant effect and the way you interpret it remain the same. The point estimate (the relative % change) and the confidence interval are still your best estimate of the observed treatment variant effect. The adjustment makes them more precise, not different in meaning.

The relative change is calculated as the adjusted treatment mean minus the adjusted control mean, divided by the unadjusted control variant mean, so the percentage you see is still relative to the actual (unadjusted) control baseline.

The variance reduction percentage

The variance reduction for a metric tells you how much of the original variance was removed by the adjustment. A variance reduction of 60% means that the adjusted estimate has 60% less variance than the raw estimate, effectively similar to having 2.5 times as many users without the adjustment.

A high variance reduction means the pre-experiment data was strongly predictive of post-experiment behavior. A variance reduction of 0% means no adjustment was applied.

Use the interactive below to see how variance reduction narrows the confidence interval compared to no adjustment, for the same sample size and metric noise.

Variance reduction and CI width

The top bar shows the CI without variance reduction; the bottom bar shows it after variance reduction. Both are centred on the same observed treatment effect. Use the direction toggle to set which way the metric should move.

Metric improves when it:
No VRVR-20%-10%0%+10%+20%
+4.2%
Not significant
+4.2%
Not significant
-15%+15%
10010,000
10 (low noise)100 (high noise)
0% (disabled)80%
Without VR
With high confidence, the true effect is between -1.5% and +9.9%. Since zero is in the interval, we cannot conclude whether the treatment improved or worsened this metric.
The result for this metric is inconclusive — collect more data, or if you have reached the required sample size, end the experiment.
With 0% VR
With high confidence, the true effect is between -1.5% and +9.9%. Since zero is in the interval, we cannot conclude whether the treatment improved or worsened this metric.
The result for this metric is inconclusive — collect more data, or if you have reached the required sample size, end the experiment.

Try the following:

  • Set variance reduction to 0%. This is the CI you would get without any adjustment.
  • Increase variance reduction to 60%. Notice how the CI narrows: that is extra precision from pre-experiment data, with no additional users.
  • Now try increasing the sample size instead. Both approaches narrow the CI; variance reduction is the free version.

Why this matters

Variance reduction is one of the main reasons sample sizes in Confidence can be smaller than in tools that do not use this technique. It improves the precision of every estimate without requiring more users. When you see a narrow confidence interval for a metric, variance reduction is often a contributing factor.

You do not need to think about variance reduction when reading results. Just know that the adjustment is there, it makes estimates more reliable, and you interpret the numbers the same way you would without it.

Notes for nerds

The regression adjustment

The variance reduction method used in Confidence is covered in detail in the variance reduction lesson in the intro to metrics course, and its effect on required sample sizes is covered in the sample size calculation III course. In short, the method fits separate regressions of the post-experiment outcome on the pre-experiment variable for each group, then adjusts the treatment variant effect estimate accordingly. The classical CUPED formulation uses a single adjustment coefficient for both groups. Fitting separate regressions per group is never worse, and is strictly better whenever users respond differently to treatment—which is the typical case. The two are equivalent only in a perfectly balanced 50/50 experiment (Negi and Wooldridge, 2020).

Bounds on covariate selection

In principle, any pre-experiment covariate can be included in the regression to reduce variance further. In practice, the pre-experiment measurement of the metric itself is hard to beat—and the gains from going beyond it are bounded. Even the most sophisticated feature engineering can narrow the confidence interval by at most a further 29% beyond what the simple pre-experiment metric already achieves (Ting and Hung, 2023).

Adjusted control means in multi-variant experiments

One consequence is that in experiments with multiple treatment variants, the variance-adjusted control variant mean shown for a given comparison uses only the data from the variants involved in that specific comparison. This means the adjusted control variant mean can differ slightly between comparisons. This is expected and correct. It does not indicate an error in the data.