Confidence
  • Pricing
  • Success stories
  • Contact us
  • Login
Start free trial
All terms
Core Experimentation

What is a Treatment Effect?

A treatment effect is the measured difference in a metric between the treatment group (users who see a change) and the control group (users who see the current experience).

A treatment effect is the measured difference in a metric between the treatment group (users who see a change) and the control group (users who see the current experience). It answers the most fundamental question in experimentation: did this change move the number? Because A/B tests use random assignment, the observed difference can be attributed to the change itself, not to pre-existing differences between users.

Treatment effects matter because they're the unit of evidence in product experimentation. Every ship-or-don't-ship decision, every guardrail check, every learning a team extracts from an experiment comes down to whether the treatment effect on a given metric is large enough, small enough, or directionally informative. At Spotify, where teams run over 10,000 experiments per year, each experiment produces treatment effects across dozens of metrics. The 42% rollback rate exists because Confidence surfaces treatment effects on guardrail metrics that would otherwise go unnoticed.

How is a treatment effect calculated?

The simplest treatment effect estimate is the difference-in-means estimator: subtract the average metric value in the control group from the average in the treatment group. If treatment users stream an average of 48 minutes per day and control users stream 46 minutes, the treatment effect estimate is +2 minutes.

That point estimate alone isn't enough. You also need a confidence interval that tells you the range of plausible true values, and a p-value that tells you how likely you'd be to observe a difference this large if the change had no real effect. Confidence computes all three automatically for every metric in an experiment, using the statistical methods (CUPED variance reduction, sequential testing, multiple testing corrections) that make the estimate as precise and trustworthy as the data allows.

Why can treatment effects be misleading?

Three things commonly distort treatment effect estimates.

Dilution from non-exposed users. If you change the checkout flow but include all users in the analysis, users who never reached checkout dilute the effect. The true treatment effect on exposed users gets averaged down toward zero. Trigger analysis solves this by restricting the analysis to users who actually encountered the change. But the effect size estimated on triggered users doesn't generalize directly to the full population: it answers "what was the effect on people who saw it?" not "what would happen if everyone saw it?"

Underpowered experiments. When an experiment doesn't have enough users or runs for too little time, the confidence interval around the treatment effect is wide. A wide interval means you can't distinguish a real positive effect from noise, or a real negative effect from zero. The experiment consumes bandwidth without producing a clear answer. Confidence's power analysis helps teams size experiments before they start, so the treatment effect estimate will be precise enough to act on.

Metric choice. The same change can show a positive treatment effect on one metric and a negative effect on another. A feature that increases engagement might also increase load time. The treatment effect is always relative to the metric you're measuring, which is why Confidence's decision framework distinguishes success metrics (what you're trying to improve) from guardrail metrics (what you're trying not to break).

What's the difference between a treatment effect and an effect size?

These terms overlap but aren't identical. A treatment effect is the raw difference between treatment and control on a specific metric in a specific experiment: "+2 minutes of daily streaming" or "-0.3 percentage points on crash rate." An effect size is often standardized (divided by the pooled standard deviation) to make comparisons across experiments and metrics possible. Cohen's d is the most common standardized effect size.

In practice, most product teams work with raw treatment effects because the business cares about the actual units: minutes, conversion percentage points, revenue per user. Standardized effect sizes become useful when planning experiments (deciding on a minimum detectable effect) or when comparing the relative magnitude of changes across metrics with different scales.

Related terms

Core Experimentation
Average Treatment Effect (ATE)

The average treatment effect (ATE) is the mean difference in outcomes between the treatment group and the control group, averaged across the entire experimental population.

Core Experimentation
Control Group

The control group is the set of users in an experiment who see the unchanged, current experience.

Statistical Methods
Statistical Significance

Statistical significance is the determination that an observed difference between experiment groups is unlikely to have occurred by chance alone.

Statistical Methods
Confidence Interval

A confidence interval is a range of values that, at a given confidence level, is expected to contain the true treatment effect.

Statistical Methods
Variance Reduction

Variance reduction is a set of statistical techniques that tighten the confidence intervals of an A/B test without requiring more traffic.

Experiment Analysis
Trigger Analysis

Trigger analysis is an experiment analysis technique that restricts the evaluation to users who actually encountered the changed feature, rather than analyzing every user assigned to the experiment.

Statistical Methods
Minimum Detectable Effect (MDE)

The minimum detectable effect (MDE) is the smallest treatment effect an experiment is designed to reliably detect at a given significance level and power.

Spotify

Learn more

  • Read our blog
  • See comparisons
  • Glossary
  • RFP guides
  • Listen to us
  • Read our docs
  • Status page

Need help

  • Contact us

Legal

  • Terms of Service
  • Data Protection Agreement
  • Privacy Policy
  • Cookies

© 2026 Spotify

The Confidence name and logo are registered trademarks of Spotify.