Lesson 8: Sample size

Sample size for experiments

Sample size refers to the number of units (most often users) included in an experiment. A larger sample size increases the sensitivity of the experiment, and allows detecting smaller differences between treatment and control group. Each metric has a hypothetical effect size known as the minimum detectable effect (MDE) for success metrics, and the non-inferiority margin (NIM) for guardrail metrics.

If you want to master sample size calculations, take the Sample size calculation - level I course.

Learn about the MDE and NIM in these short videos:

A smaller MDE or NIM requires a larger sample size. If you add more metrics, the required sample size increases. The variance of each metric impacts the required sample size. Metrics that measure a quantity per user (for example Minutes played per user) usually require a larger sample size than metrics that measure the Share of users who [completed some action].

The sample size calculation tells you the sample size that your experiment needs. This is often called the required sample size. If the sample size you expect to reach is larger than the required sample size, you can be confident that you have collected enough user data to see if your new feature had the intended effect.

If the required sample size turns out to be unrealistically large (for example, twice as much as you expect to reach), you need to go back and edit the settings of your experiment. For example, select fewer metrics or adjust your expectations for what you can reliably detect and aim for a larger MDE or NIM.

The results show the total required sample size alongside a per-metric breakdown. The metric with the highest required sample size determines the total, making it easy to spot the bottleneck and decide if you need to drop or adjust a metric.