Confidence
  • Pricing
  • Success stories
  • Contact us
  • Login
Start free trial
All terms
Statistical Methods

What is a Sample Size?

Sample size is the number of experimental units (typically users) needed in an A/B test to detect a given effect with a specified level of confidence and power.

Sample size is the number of experimental units (typically users) needed in an A/B test to detect a given effect with a specified level of confidence and power. Too few users and the experiment can't distinguish real effects from noise. Too many and you've used traffic that could have powered another test.

Getting sample size right is the most consequential pre-experiment decision. An experiment with inadequate sample size produces ambiguous results regardless of how good the hypothesis, implementation, or metrics are. At Spotify, where over 10,000 experiments run per year and 58 teams shared the mobile home screen for 520 experiments in 2025, sample size planning directly determines how many experiments the organization can run concurrently. Every test that runs longer than necessary because it was undersized (or oversized for its MDE) consumes bandwidth that could have gone to another team's experiment.

How is sample size calculated?

Sample size depends on four parameters:

Significance level (alpha). The false positive rate you'll tolerate, typically 0.05.

Statistical power. The probability of detecting a real effect, typically 0.80 or higher.

Minimum detectable effect (MDE). The smallest change in the metric you want to be able to detect. Smaller MDEs require more users.

Metric variance. How much the metric varies across users. Higher variance means more noise, which means more users to cut through it.

For a two-sample z-test on a continuous metric, the per-group sample size is roughly:

n = (Z_alpha + Z_beta)^2 * 2 * sigma^2 / delta^2

where sigma^2 is the metric variance and delta is the MDE. The formula makes the trade-offs explicit: halving the MDE quadruples the required sample. Doubling the variance doubles it. These aren't independent knobs. They're constraints that bind against each other.

Confidence provides sample size calculators that account for the specific statistical method being used. Sequential testing methods like Group Sequential Tests require a larger maximum sample size than a fixed-horizon test (typically 20-30% more) because they spend some statistical budget on interim analyses. The calculator shows this overhead and lets teams decide whether the ability to stop early is worth the additional maximum sample.

What reduces the required sample size?

Variance reduction. CUPED (Controlled-experiment Using Pre-Existing Data) uses pre-experiment metric values to remove predictable noise. Confidence applies the Negi-Wooldridge full regression estimator, which is more precise than the original CUPED formulation. Variance reduction of ~50% is common for metrics with stable user-level baselines, which translates to roughly halving the required sample size.

Trigger analysis. If only 10% of users encounter the changed feature, including all assigned users in the analysis dilutes the effect by 10x. Restricting the analysis to triggered users recovers the undiluted effect and dramatically reduces the sample needed to detect it.

Bolder implementations. A treatment that produces a 5% lift needs a quarter of the sample required to detect a 2.5% lift. Testing the "Maximum Viable Change" first (the loudest version of the idea that still works as a user experience) is often the cheapest way to reduce required sample size.

Better metrics. Some metrics are inherently lower-variance than others. Revenue per user has higher variance than conversion rate in most products. Choosing a metric with lower natural variance, or capping extreme values, reduces sample requirements.

What goes wrong when sample size is too small or too large?

Undersized experiments produce results that can't be interpreted. The confidence interval is wide enough to include both meaningful improvement and meaningful harm. The team spent the engineering effort to build, instrument, and monitor the experiment and got nothing actionable in return.

Oversized experiments waste traffic. If you only needed 50,000 users but ran 200,000, you've consumed bandwidth for three extra weeks that another experiment could have used. At organizations running many concurrent tests, this matters.

The discipline is in doing the calculation before the experiment launches and committing to the plan. Confidence flags experiments where the projected runtime exceeds the team's planned window or where the MDE is unrealistically small for the available traffic, helping teams avoid both failure modes before they start.

Related terms

Statistical Methods
Statistical Power

Statistical power is the probability that an experiment will detect a real effect when one exists.

Statistical Methods
Minimum Detectable Effect (MDE)

The minimum detectable effect (MDE) is the smallest treatment effect an experiment is designed to reliably detect at a given significance level and power.

Statistical Methods
Significance Level (Alpha)

The significance level, commonly called alpha, is the maximum false positive rate you're willing to accept in an experiment.

Statistical Methods
Variance

Variance is a measure of how much a metric's values spread out across users.

Statistical Methods
Variance Reduction

Variance reduction is a set of statistical techniques that tighten the confidence intervals of an A/B test without requiring more traffic.

Statistical Methods
CUPED

CUPED (Controlled-experiment Using Pre-Existing Data) is a variance reduction method that uses data from before an experiment started to remove predictable noise from metric estimates, producing ti...

Core Experimentation
Experiment Bandwidth

Experiment bandwidth is an organization's capacity to run concurrent experiments, constrained by available traffic, metric infrastructure, statistical rigor, and team coordination.

Spotify

Learn more

  • Read our blog
  • See comparisons
  • Glossary
  • RFP guides
  • Listen to us
  • Read our docs
  • Status page

Need help

  • Contact us

Legal

  • Terms of Service
  • Data Protection Agreement
  • Privacy Policy
  • Cookies

© 2026 Spotify

The Confidence name and logo are registered trademarks of Spotify.