What is Dilution?

Dilution is the weakening of an observed treatment effect that occurs when users who were never exposed to the changed feature are included in the experiment analysis. If a change only affects users who reach a specific screen or code path, analyzing the full assigned population averages the real effect across both affected and unaffected users, pulling the estimated impact toward zero.

Dilution is one of the most common reasons experiments return inconclusive results. The change may genuinely work for the people who encounter it, but the signal gets lost in the noise of everyone who doesn't.

Why does dilution happen?

Most product experiments assign users to treatment or control at the point of randomization, which typically happens before the user reaches the feature being tested. A user might be assigned to the treatment group when they open the app, but the change only appears on a settings page that 15% of users visit in any given week.

The remaining 85% contribute their metric values to the analysis even though they experienced no difference between treatment and control. Their data is pure noise from the perspective of the treatment effect, but it counts in the denominator. The result: an effect that might be a 10% lift among exposed users looks like a 1.5% lift across the full population, often too small to reach statistical significance.

How do you detect and address dilution?

The clearest signal is a gap between the trigger analysis result and the full-population result. If trigger analysis shows a meaningful effect but the intent-to-treat analysis is flat, dilution is likely the explanation.

Confidence runs both analyses automatically for every experiment, so teams can spot dilution without extra configuration. When dilution is the issue, the response depends on the goal:

If you need the full-population effect to be meaningful, the feature needs higher reach. Dilution is telling you the change works but not enough users encounter it.
If you need to understand whether the mechanism works, the trigger analysis result is the one to trust. The per-exposed-user effect is real; it's just not visible at the population level.
If power is the concern, techniques like CUPED variance reduction can help by tightening confidence intervals, but they don't eliminate the fundamental dilution problem. You're still averaging a real effect with a lot of zeros.

What is Dilution?

Why does dilution happen?

How do you detect and address dilution?

Related terms