Confidence
  • Documentation
  • Blog
  • Bootcamp
  • Status
  • Confidence Bootcamp
    • My learning
    • Intro to experimentation
      • Introduction
      • Lesson 1: Why you should experiment
      • Lesson 2: Experiment hypothesis
      • Lesson 3: Success and guardrail metrics
      • Lesson 4: Success metrics
      • Lesson 5: Set up your experiment
      • Lesson 6: Calculation frequency
      • Lesson 7: Target audience
      • Lesson 8: Sample size
      • Lesson 9: Quality assurance
      • Lesson 10: Run your experiment
      • Lesson 11: Evaluate your experiment and make a decision
      • Lesson 12: A/B tests and rollouts
      • Course wrap up
    • Intro to metrics
      • Introduction
      • Lesson 1: What is a metric?
      • Lesson 2: Metric roles
      • Lesson 3: Time considerations
      • Lesson 4: Capturing behavior
      • Lesson 5: Strategic metrics
      • Lesson 6: Interpretability
      • Lesson 7: Feasibility and sensitivity
      • Lesson 8: Variance reduction
      • Lesson 9: Select metrics
      • Lesson 10: Segment-level analysis
      • Course wrap up
    • Scientific product development
      • Introduction
      • Lesson 1: Why you should experiment
      • Lesson 2: The scientific method
      • Lesson 3: Randomized controlled trials
      • Lesson 4: Experiment hypothesis
      • Lesson 5: Case study
        • Case study
        • Answers to case study
      • Lesson 6: Why do we need statistics?
      • Lesson 7: Success metrics
      • Lesson 8: Detectable effects and sample size
      • Lesson 9: Make a decision
      • Course wrap up
    • A primer on hypothesis testing
      • Introduction
      • Lesson 1: Introduction to hypothesis testing
      • Lesson 2: True vs estimated effects
      • Lesson 3: Sampling distribution of the difference-in-means estimator
      • Lesson 4: Z-tests and how to reject the null hypothesis
      • Lesson 5: False postive rate and alpha
      • Lesson 6: True positive rate, MDE, and power
      • Course wrap up
    • Intro to Feature Flags
      • Introduction
      • Lesson 1: What is a feature flag?
      • Lesson 2: Lifecycle of a feature flag
      • Lesson 3: Clients
      • Lesson 4: Evaluation context and targeting
    • Sample size calculation - I
      • Introduction
      • Lesson 1: What is the required sample size?
      • Lesson 2: Alpha and power
      • Lesson 3: Baseline mean and variance
      • Lesson 4: Sample size playground - I
    • Sample size calculation - II
      • Introduction
      • Lesson 1: Multi-metric decision making
      • Lesson 2: Number of success metrics
      • Lesson 3: Number of guardrail metrics
      • Lesson 4: Number of comparisons
      • Lesson 5: Sample size playground - II
    • Sample size calculation - III
      • Introduction
      • Lesson 1: Binary metrics
      • Lesson 2: Treatment group proportions
      • Lesson 3: Variance reduction
      • Lesson 4: Sequential testing and sample size
      • Lesson 5: Sample size playground - III
    • Advance your experimentation
      • Introduction
      • Lesson 1: Guardrail metrics with non-inferiority margins
      • Lesson 2: Choose evaluation frequency
      • Lesson 3: Metrics' roles in experiments
      • Lesson 4: Cumulative holdback evaluations
    • Experimentation culture
      • Introduction
      • Lesson 1: Onboarding into experimentation
      • Lesson 2: Empowering experimentation champions
      • Lesson 3: Sustaining the experimentation culture
    • Videos

Lesson 3: Baseline Mean Variance and the MDE

Summary

This lesson teaches you about how the metric affects the sample size required to power an experiment and what the Minimum Detectable Effect (MDE) is:

  • The baseline variance of the outcome metric.
  • The baseline mean of the outcome metric.
  • What the Minimum Detectable Effect (MDE) and the relative MDE are.

Baseline variance

The 'baseline variance', is just a fancy way of saying "the variance of the outcome metric under no treatment". The baseline variance directly affects the risk management of the experiment and therefore the sample size calculation.

As we will see in the following sections, the baseline variance goes directly into the sample size calculation formula. If the variance increases with 10%, then the sample size required to detect a certain MDE with a certain power also increases 10%.


Baseline mean and the MDE

Refresher on MDE

If you need a refresher on the concept of MDE, check out Lesson 6 in the hypothesis testing course or watch this video:

Baseline mean

The baseline mean is just another word for "the average value of the outcome metric under no treatment." The baseline mean is important because the relative MDE is translated into an absolute MDE using the baseline mean.

Relation between relative and absolute MDE

Baseline means close to zero

If the baseline mean is very small, then even a large relative MDE might correspond to a practically irrelevant absolute MDE. This is particularly common for binary metrics with very low rates.

Consider the following example:

The baseline mean of crash rates is 0.0001. That is, one out of 10,000 users experiences a crash on average. If we want to detect a 10% relative MDE, then the absolute MDE is 0.00001. So we want to detect if the rate goes from 0.0001 to 0.00011, or, in other words, if one more user per 100,000 users experiences a crash on average.

Since this is a very small change, we need a large sample size to detect it.

Absolute versus Relative MDE

At this point you might wonder why, if we aren't aware of the baseline mean, why do we use the relative MDE to express the effect we want to be able to detect? This is the right question to ask! There are two reasons for why the relative effect is often used. First, it is a nice way to understand impact in metrics where the absolute values are hard to interpret or have intuition for like minutes played at Spotify. Second, because of the first reason, most experimentation tools let the user specify the MDE on a relative scale. Which scale to use makes most sense is mainly a matter of preference. In any case, understanding this relation and the importance of always considering the baseline mean is helpful for all experimenters seeking to understand sample size calculations.


Reader exercise

How does the baseline variance of the outcome metric affect the sample size required for an experiment?

Reader exercise

What happens if the baseline mean of the outcome metric is very small?

Was this page helpful?

PreviousLesson 2: Alpha and power
NextLesson 4: Sample size playground - I

© Copyright 2026. All rights reserved.

Follow us on TwitterFollow us on GitHub

On this page

  1. Baseline variance

  2. Baseline mean and the MDE