Confidence
  • Pricing
  • Success stories
  • Contact us
  • Login
Start free trial
All terms
Experiment Analysis

What are Observational Bias?

Observational bias is systematic error introduced when the data collection or analysis process produces results that consistently differ from the truth.

Observational bias is systematic error introduced when the data collection or analysis process produces results that consistently differ from the truth. In experimentation, it most commonly appears when users aren't randomly assigned to groups, when measurement differs between treatment and control, or when the analysis selects a non-representative subset of the data.

Randomized experiments exist specifically to eliminate observational bias. When bias enters an experiment through broken instrumentation, differential measurement, or post-hoc analysis choices, it undermines the entire purpose of running the test.

What forms does observational bias take in experiments?

Selection bias occurs when the process that determines which users enter the analysis is correlated with the outcome. Comparing users who opted into a feature against those who didn't tells you about the users, not the feature. Self-selected groups differ systematically: early adopters are more engaged, more technical, and more tolerant of rough edges. Any comparison between these groups confounds the treatment effect with the selection effect.

Measurement bias occurs when the instrumentation captures data differently across variants. If the treatment variant logs events at a different point in the code path than control, completion rates can differ for reasons that have nothing to do with user behavior. A team at Spotify discovered that a checkout experiment showed a 2% lift that disappeared after fixing a logging discrepancy where the treatment fired its conversion event slightly earlier in the flow.

Survivorship bias appears when your analysis only includes users who completed the full experiment period, excluding those who dropped off. If the treatment causes more dropoffs (or fewer), the surviving population is no longer representative, and the comparison is biased.

Attrition bias is the specific case where users leave the experiment at different rates across variants. This often manifests as a sample ratio mismatch, which is why Confidence checks for SRM automatically on every experiment.

How do randomized experiments prevent observational bias?

Randomization ensures that, in expectation, every user characteristic is balanced across groups. There's no selection process that could favor one group over another. The only systematic difference between treatment and control is the treatment itself.

But randomization only prevents bias at the point of assignment. Bias can still enter through:

  • Differential attrition after assignment
  • Asymmetric instrumentation between variants
  • Analysis choices made after seeing the data (the garden of forking paths)
  • Post-treatment filtering that conditions on affected outcomes

This is why a complete experimentation platform doesn't just randomize. It also checks for sample ratio mismatches, logs exposure symmetrically for both variants, and runs pre-specified analyses. Confidence automates these checks because each one closes a specific channel through which bias could enter.

How is observational bias different from random error?

Random error (noise) averages out with larger samples. Observational bias does not. If your logging is broken in a way that overcounts conversions in treatment by 0.3%, that bias persists no matter how many users you add. More data makes the biased estimate more precise, not more accurate.

This distinction matters because underpowered experiments and biased experiments look similar on the surface (both produce unreliable results), but the fix is completely different. More sample size solves noise. It does not solve bias.

Related terms

Experiment Analysis
Confounding Variables

A confounding variable is a factor that influences both the treatment assignment and the outcome being measured, creating a spurious association that can be mistaken for a causal effect.

Experiment Analysis
Sample Ratio Mismatch

A sample ratio mismatch (SRM) occurs when the observed number of users in each experiment group differs from the intended allocation ratio by more than chance alone would explain.

Experiment Analysis
Garden of Forking Paths

The garden of forking paths refers to the many implicit analytical choices a researcher or analyst makes during an experiment's lifecycle, each of which could have gone differently and each of whic...

Spotify

Learn more

  • Read our blog
  • See comparisons
  • Glossary
  • RFP guides
  • Listen to us
  • Read our docs
  • Status page

Need help

  • Contact us

Legal

  • Terms of Service
  • Data Protection Agreement
  • Privacy Policy
  • Cookies

© 2026 Spotify

The Confidence name and logo are registered trademarks of Spotify.