An online controlled experiment is the formal term for an A/B test conducted in a live digital product. Users are randomly assigned to a control group (current experience) or one or more treatment groups (modified experience), and the difference in behavioral metrics between groups is measured to determine the causal effect of the change. The word "online" means the experiment runs on live traffic in real time, not that it happens on the internet.
This is the term you'll find in academic papers and industry research from Microsoft, Google, and Spotify. It connects product A/B testing to the broader tradition of randomized controlled trials in medicine and social science. The methodology is the same: randomization eliminates confounding variables so that any observed difference between groups can be attributed to the change being tested, not to pre-existing differences in the populations.
How does an online controlled experiment differ from offline testing?
Offline evaluation uses historical data to simulate what would have happened under a different product experience. Online controlled experiments measure what actually happens when real users interact with the change.
The distinction matters because user behavior in production is hard to predict from logged data. A recommendation algorithm that looks better on historical metrics might perform worse when users actually interact with it, because the act of showing different content changes user behavior in ways historical data can't capture. This is why Spotify runs over 10,000 online experiments per year rather than relying solely on offline model evaluation. The Spotify Search team's experimentation maturity arc explicitly moved from offline evaluation to online controlled experiments as the team's practice matured.
Offline evaluation is still valuable for screening: it can quickly eliminate ideas that are clearly worse before they consume experiment bandwidth. But the final validation of any product change that affects user experience requires an online controlled experiment.
What makes the "controlled" part important?
The "controlled" in online controlled experiment refers to two things: the control group and the controlled conditions.
The control group provides the counterfactual. It tells you what would have happened if you hadn't made the change. Without a control group, you're comparing "after the change" to "before the change," and any time-based trend (seasonality, marketing campaigns, competitor actions) gets mixed into your result.
Controlled conditions mean that the only difference between groups is the change being tested. Randomization handles this statistically: by assigning users randomly, you ensure the groups are equivalent in expectation on every dimension, observed and unobserved. If the groups aren't equivalent, the experiment is confounded. Confidence's sample ratio mismatch detection catches one common source of confounding: when the observed split between groups doesn't match the intended split, indicating that something in the assignment or logging is broken.
Why does the terminology matter?
Using the term "online controlled experiment" isn't just academic precision. It clarifies what the method actually requires. Many things get called "experiments" in product development that aren't controlled experiments: launching a feature and watching metrics go up, comparing metrics across user segments who self-selected into different behaviors, running a survey after a product change. None of these are controlled experiments because none of them use randomization to establish a valid counterfactual.
When Confidence refers to "experiments" in the platform, it means online controlled experiments: randomized assignment, concurrent control and treatment groups, and statistical analysis of the difference. This matters because the causal claims you can make from a controlled experiment are fundamentally stronger than what any observational analysis can provide.