Lesson 1: Binary metrics
This lesson explains how binary metrics differ from continuous metrics in sample size calculations. For binary metrics, the variance under the alternative is know, this is not the case for continuous metrics.
Variance of a binary metric
Binary metrics, such as whether a user clicked a button or made a purchase, have specific properties. The variance of a binary metric is a deterministic function of the mean (proportion of ones) of the metric.
Let p represent the proportion of ones in the metric. Then the variance of the
metric is calculated as Variance = p * (1 - p).
For example, assume we have a metric measuring whether a user clicked a button. The mean of the metric is the proportion of users who clicked the button.
-
If the mean is
p = 0.5, the variance is 0.5 * (1 - 0.5) = 0.25 -
If the mean is
p = 0.1, the variance is 0.1 * (1 - 0.1) = 0.09
Variance in the treatment group
For continuous metrics, the best guess of the variance in the treatment group is typically the same as the variance in the control group. This is not because we might expect the variance to be the same, but because it is hard to make an informed guess about how a treatment effect of size MDE (Minimum Detectable Effect) would affect the variance.
For binary metrics, the situation is different. Since the variance of a
binary metric is a function of its mean, we can determine the variance in the
treatment group under a treatment effect of size MDE. Let p represent the proportion of ones in the control group. Then the
variance in the treatment group is calculated as Treatment Variance = (p +
MDE) * (1 - (p + MDE)).
For example, if p = 0.3 and the MDE is 0.1, then the treatment group
variance would be (0.3 + 0.1) * (1 - (0.3 + 0.1)) = 0.4 * 0.6 = 0.24
How is the variance of a binary metric calculated?
How does the calculation of variance differ between binary and continuous metrics?
How is the variance in the treatment group for a binary metric determined under a treatment effect of size MDE?
Note for nerds
For guardrail metrics, the alternative hypothesis used in sample size calculations assumes that the proportion in the treatment group is the same as in the control group.
Technically, the variance in the treatment group used in the sample size calculation should therefore vary depending on whether the metric is a guardrail metric or a success metric.
In practice, the difference is often small enough to ignore, but Confidence's sample size calculator corrects for this.