Lesson 4: Number of comparisons

The impact of multiple comparisons

The most common pattern in product A/B tests is to compare all treatment groups against a control group. This means there are as many comparisons as there are treatment groups being tested.

In principle, it is also possible to compare all treatment groups against each other. This would mean the number of comparisons equals the number of pairs of treatments.

The number of comparisons affects the required sample size. The more comparisons, the more samples are required. This is because the probability of making a Type I error (false positive) increases with the number of comparisons. To counter this, we adjust the alpha level for multiple comparisons, which increases the required sample size.

The intuition behind this adjustment is that the more tests we run, the more chances there are to find a significant result by random chance. For example, if we run an experiment with 100 treatments and an alpha of 10%, even if no treatment has any effect, we would expect to see 10 treatments with a (false positive) significant result just by chance.

Notes for nerds

Some people wonder what to do if more than one treatment is significantly better than the control group. This is a deep question. You can test the treatments against each other to see if one is better than the other. However, the difference between the treatment groups is likely smaller than the difference between them and the control group. This makes the power to detect a difference between treatments lower than the power to detect a difference between a treatment and the control group.

There are more advanced methods for finding the best treatment among many, such as Tukey's and Scheffé's methods, and Dunnett's test. However, we don't recommend using these methods due to the complexity involved in learning how to use them. Instead, you should gather stakeholders to decide which of the significant treatments to implement based on factors such as:

  • Complexity
  • Cost
  • Future extensibility