Lesson 3: Baseline Mean Variance and the MDE
This lesson teaches you about how the metric affects the sample size required to power an experiment and what the Minimum Detectable Effect (MDE) is:
- The baseline variance of the outcome metric.
- The baseline mean of the outcome metric.
- What the Minimum Detectable Effect (MDE) and the relative MDE are.
Baseline variance
The 'baseline variance', is just a fancy way of saying "the variance of the outcome metric under no treatment". The baseline variance directly affects the risk management of the experiment and therefore the sample size calculation.
As we will see in the following sections, the baseline variance goes directly into the sample size calculation formula. If the variance increases with 10%, then the sample size required to detect a certain MDE with a certain power also increases 10%.
Baseline mean and the MDE
Refresher on MDE
If you need a refresher on the concept of MDE, check out Lesson 6 in the hypothesis testing course or watch this video:
Baseline mean
The baseline mean is just another word for "the average value of the outcome metric under no treatment." The baseline mean is important because the relative MDE is translated into an absolute MDE using the baseline mean.
Baseline means close to zero
If the baseline mean is very small, then even a large relative MDE might correspond to a practically irrelevant absolute MDE. This is particularly common for binary metrics with very low rates.
Consider the following example:
The baseline mean of crash rates is 0.0001. That is, one out of 10,000 users experiences a crash on average. If we want to detect a 10% relative MDE, then the absolute MDE is 0.00001. So we want to detect if the rate goes from 0.0001 to 0.00011, or, in other words, if one more user per 100,000 users experiences a crash on average.
Since this is a very small change, we need a large sample size to detect it.
Absolute versus Relative MDE
At this point you might wonder why, if we aren't aware of the baseline mean, why do we use the relative MDE to express the effect we want to be able to detect? This is the right question to ask! There are two reasons for why the relative effect is often used. First, it is a nice way to understand impact in metrics where the absolute values are hard to interpret or have intuition for like minutes played at Spotify. Second, because of the first reason, most experimentation tools let the user specify the MDE on a relative scale. Which scale to use makes most sense is mainly a matter of preference. In any case, understanding this relation and the importance of always considering the baseline mean is helpful for all experimenters seeking to understand sample size calculations.