A confounding variable is a factor that influences both the treatment assignment and the outcome being measured, creating a spurious association that can be mistaken for a causal effect. In a well-designed A/B test, randomization eliminates confounding by ensuring that any variable other than the treatment is balanced across groups on average. Confounding becomes a problem when randomization breaks down, when analysis conditions on post-treatment variables, or when results from observational data are interpreted as causal.
Understanding confounders is what separates causal inference from correlation. Two metrics can move together without one causing the other, and confounding is usually the reason.
How does randomization control for confounders?
Random assignment is the single most effective defense against confounding. When users are randomly allocated to treatment and control, every characteristic that could influence the outcome (device type, engagement level, geography, time of day) is distributed equally across groups in expectation. Any observed difference in the metric can be attributed to the treatment because, on average, the groups are identical in every other respect.
This is why A/B tests produce causal evidence and observational comparisons generally don't. If you compare users who adopted a feature voluntarily against users who didn't, the groups differ in ways beyond the feature itself. Users who adopt new features tend to be more engaged, more tech-savvy, and more active. Those pre-existing differences confound the comparison.
Confidence's assignment uses a deterministic hash of user ID and a salt, which guarantees random allocation without storing state. The randomization is the foundation that makes everything else in the analysis valid.
When does confounding sneak into experiments?
Even in randomized experiments, confounding can appear through a few mechanisms:
Post-treatment conditioning. If your analysis filters to users who completed a specific action (like making a purchase), and the treatment affects whether users take that action, you've introduced a confound. The filtered populations in treatment and control are no longer comparable because the treatment influenced who passed the filter.
Differential attrition. If more users drop out of the treatment group than control (or vice versa), the remaining users aren't comparable. The treatment group is now a selected subset, and any observed difference could reflect that selection rather than the treatment itself. This is closely related to sample ratio mismatch.
Interaction with other experiments. When a user is in multiple simultaneous experiments, the combined effects can create confounds if the experiments interact. Confidence's coordination features (Surfaces) help teams manage this by grouping experiments that share the same part of the product.
Time-varying confounders. If the treatment is rolled out gradually and something else changes during the rollout (a holiday, a competitor action, a platform update), the time trend confounds the treatment effect. A/B tests with concurrent controls avoid this because both groups experience the same external conditions.
How do you detect confounding in practice?
Pre-experiment balance checks (also called A/A analysis or covariate balance tests) compare treatment and control on metrics measured before the experiment started. If the groups differ on pre-treatment variables, something went wrong with randomization or instrumentation.
Sample ratio mismatch checks, which Confidence runs automatically, catch one common symptom: if the observed allocation ratio doesn't match the intended ratio, the groups aren't balanced and confounding is likely.
For observational analyses where randomization isn't possible, techniques like regression adjustment, propensity score matching, and instrumental variables can reduce but never fully eliminate confounding. The residual risk is why randomized experiments remain the standard for causal claims.