Confidence
  • Documentation
  • Blog
  • Bootcamp
  • Status
  • Confidence Bootcamp
    • My learning
    • Intro to experimentation
      • Introduction
      • Lesson 1: Why you should experiment
      • Lesson 2: Experiment hypothesis
      • Lesson 3: Success and guardrail metrics
      • Lesson 4: Success metrics
      • Lesson 5: Set up your experiment
      • Lesson 6: Calculation frequency
      • Lesson 7: Target audience
      • Lesson 8: Sample size
      • Lesson 9: Quality assurance
      • Lesson 10: Run your experiment
      • Lesson 11: Evaluate your experiment and make a decision
      • Lesson 12: A/B tests and rollouts
      • Course wrap up
    • Intro to metrics
      • Introduction
      • Lesson 1: What is a metric?
      • Lesson 2: Metric roles
      • Lesson 3: Time considerations
      • Lesson 4: Capturing behavior
      • Lesson 5: Strategic metrics
      • Lesson 6: Interpretability
      • Lesson 7: Feasibility and sensitivity
      • Lesson 8: Variance reduction
      • Lesson 9: Select metrics
      • Lesson 10: Segment-level analysis
      • Course wrap up
    • Scientific product development
      • Introduction
      • Lesson 1: Why you should experiment
      • Lesson 2: The scientific method
      • Lesson 3: Randomized controlled trials
      • Lesson 4: Experiment hypothesis
      • Lesson 5: Case study
        • Case study
        • Answers to case study
      • Lesson 6: Why do we need statistics?
      • Lesson 7: Success metrics
      • Lesson 8: Detectable effects and sample size
      • Lesson 9: Make a decision
      • Course wrap up
    • A primer on hypothesis testing
      • Introduction
      • Lesson 1: Introduction to hypothesis testing
      • Lesson 2: True vs estimated effects
      • Lesson 3: Sampling distribution of the difference-in-means estimator
      • Lesson 4: Z-tests and how to reject the null hypothesis
      • Lesson 5: False postive rate and alpha
      • Lesson 6: True positive rate, MDE, and power
      • Course wrap up
    • Intro to Feature Flags
      • Introduction
      • Lesson 1: What is a feature flag?
      • Lesson 2: Lifecycle of a feature flag
      • Lesson 3: Clients
      • Lesson 4: Evaluation context and targeting
    • Sample size calculation - I
      • Introduction
      • Lesson 1: What is the required sample size?
      • Lesson 2: Alpha and power
      • Lesson 3: Baseline mean and variance
      • Lesson 4: Sample size playground - I
    • Sample size calculation - II
      • Introduction
      • Lesson 1: Multi-metric decision making
      • Lesson 2: Number of success metrics
      • Lesson 3: Number of guardrail metrics
      • Lesson 4: Number of comparisons
      • Lesson 5: Sample size playground - II
    • Sample size calculation - III
      • Introduction
      • Lesson 1: Binary metrics
      • Lesson 2: Treatment group proportions
      • Lesson 3: Variance reduction
      • Lesson 4: Sequential testing and sample size
      • Lesson 5: Sample size playground - III
    • Advance your experimentation
      • Introduction
      • Lesson 1: Guardrail metrics with non-inferiority margins
      • Lesson 2: Choose evaluation frequency
      • Lesson 3: Metrics' roles in experiments
      • Lesson 4: Cumulative holdback evaluations
    • Experimentation culture
      • Introduction
      • Lesson 1: Onboarding into experimentation
      • Lesson 2: Empowering experimentation champions
      • Lesson 3: Sustaining the experimentation culture
    • Videos

Lesson 8: Sample size

Summary

Sample size calculations help you understand the number of users you need to include in your experiment to detect the effect of your new feature. This lesson explains how to calculate the sample size for your experiment and interpret the results.

Sample size for experiments

Sample size refers to the number of units (most often users) included in an experiment. A larger sample size increases the sensitivity of the experiment, and allows detecting smaller differences between treatment and control group. Each metric has a hypothetical effect size known as the minimum detectable effect (MDE) for success metrics, and the non-inferiority margin (NIM) for guardrail metrics.

If you want to master sample size calculations, take the Sample size calculation - level I course.

Learn about the MDE and NIM in these short videos:

A smaller MDE or NIM requires a larger sample size. If you add more metrics, the required sample size increases. The variance of each metric impacts the required sample size. Metrics that measure a quantity per user (for example Minutes played per user) usually require a larger sample size than metrics that measure the Share of users who [completed some action].

The sample size calculation tells you the sample size that your experiment needs. This is often called the required sample size. If the sample size you expect to reach is larger than the required sample size, you can be confident that you have collected enough user data to see if your new feature had the intended effect.

In Confidence

Confidence calculates the required sample size based on your experiment setup. Click the Calculate icon on the right sidebar once you've configured your metrics and MDE. The calculation uses historical data and doesn't start your experiment or expose any users.

If the required sample size turns out to be unrealistically large (for example, twice as much as you expect to reach), you need to go back and edit the settings of your experiment. For example, select fewer metrics or adjust your expectations for what you can reliably detect and aim for a larger MDE or NIM.

The results show the total required sample size alongside a per-metric breakdown. The metric with the highest required sample size determines the total, making it easy to spot the bottleneck and decide if you need to drop or adjust a metric.

In Confidence

Required sample size In this example, the crash rate metric has a much higher required sample size than the checkout metric. Increasing its MDE or NIM would bring the total down. Learn more about adjusting the required sample size.

Reader exercise

Which of the following tasks is NOT a fitting one for the sample size calculator?

Was this page helpful?

PreviousLesson 7: Target audience
NextLesson 9: Quality assurance

© Copyright 2026. All rights reserved.

Follow us on TwitterFollow us on GitHub

On this page

  1. Sample size for experiments