What is the required sample size?
In this lesson you learn what the required sample size is, and why it's important to calculate it before running an experiment.
What is the required sample size?
When you run an experiment, you can either collect too little data or too much data, by running your experiment on too few or to many of your users. The number of users in your experiment is called the sample size of the experiment.
Having a too small sample can lead to inconclusive results, while having a too large sample means wasting resources. The required sample size is the number of users you need in your experiment to achieve a certain level of precision in the results of your experiment.
It might be clear enough why it's bad to run experiments with unnecessarily large samples, but what exactly does inconclusive results mean? Inconclusive results are when there is no significant effect of the change you made in the results but there is so much uncertainty in the results that you can't say for sure if there is an effect or not: Perhaps you missed an existing effect due to all the noise in the data.
One way to think about the required sample size is to think about it as the sample size you need to reach a certain level of precision in your effect estimates. If you have a large sample, you will have high precision (tight confidence intervals), if you have a small sample, you will have low precision (wide confidence intervals).
How do you know what the perfect amount of certainty is? You find it by deciding on a set of parameters that bounds the risk of reaching the incorrect conclusion from the experiment. The following lessons will teach you about what those parameters are and how, once they are specified, they determine the required sample size. You decide what risks of incorrect conclusions your are willing to take, and then calculate the required sample size to reach that level of precision.
Why do you need to calculate the required sample size?
It's possible to calculate the current powered effect during an experiment. This makes it possible to stop an experiment when the sample size is just right. However, there are several reasons why it's better to calculate the required sample size before running the experiment.
If you don't calculate the required sample size before running an experiment, you don't know if the sample size you can reach is sufficient to reach the level of precision you need to make a decision based on the experiment.
This is an important realization: The required sample size calculation tells you what sample size you need to reach a certain precision. That doesn't mean that you can reach that sample size. For example, it's not uncommon that you select the risk parameters such that the required sample size is larger than the total number of users that your product has. This is frustrating, as this means that you can't learn what you'd wish with the precision you'd hoped for. But at least the required sample size calculation lets you know that before you start the experiment—which means that you don't have to waste time and resources going after the impossible.
The other aspect of calculating the required sample size before starting the experiment is that it helps you plan the experiment. In many cases, you can change aspects of an experiment to require a smaller sample size. For example, you can change the duration of the experiment, the number of variations you test, or the level of risk you are willing to take. By knowing the required sample size, you can make these decisions in an informed way.
What does 'inconclusive results' mean in the context of an experiment?
Why is it crucial to calculate the required sample size before running an experiment?
What does it mean if the calculated required sample size is larger than the total number of users for a product?
Notes for nerds
It's not obvious that calculating the powered effect as a stopping rule during an experiment is fine from an statistical inference perspective. Calculating the current powered effect requires peeking at the data, which could lead to inference issues, just like peeking at the data to stop the experiment when the p-value is low.
However, looking at the current effect mean only 'peeking' at the sample variance, which turns out to be fine. See our paper on precision based experimental designs for more information.