The control group is the set of users in an experiment who see the unchanged, current experience. They receive no treatment. Their metrics serve as the baseline against which the treatment group is compared, providing the counterfactual: what would have happened if you hadn't made the change?
Without a control group, there is no experiment. You're just launching a feature and watching what happens. If engagement goes up after a launch, you can't tell whether the feature caused it or whether something else happened at the same time: a marketing campaign, a seasonal trend, a competitor outage. The control group eliminates that ambiguity. Because it experiences the exact same time period and external conditions as the treatment group, any difference between the groups is attributable to the change itself. This is the core logic behind every A/B test Confidence runs, from small feature tweaks to experiments spanning Spotify's 750 million users.
How is the control group assigned?
Users are assigned to the control group through randomization. In Confidence, a deterministic hash of the user's ID and an experiment-specific salt maps each user to either control or treatment. The hash is deterministic: the same user always lands in the same group, across sessions and devices, without storing the assignment anywhere.
Randomization ensures the control and treatment groups are statistically equivalent at the start of the experiment. On average, the two groups will have the same distribution of user ages, activity levels, geographies, device types, and every other characteristic. This isn't guaranteed for any individual experiment (random fluctuations happen), but Confidence's sample ratio mismatch (SRM) detection automatically checks whether the group sizes match the intended ratio. A significant SRM indicates something went wrong with the randomization or the logging, and the experiment results shouldn't be trusted.
What makes a good control group?
The control group should experience exactly what users would experience if the experiment didn't exist. This sounds obvious but has practical implications.
Don't change the control experience mid-experiment. If a bug fix or unrelated feature change goes out to all users, including the control group, during the experiment, you've changed the baseline. The treatment effect you measure is now relative to a moving target. Confidence tracks experiment timelines and flags changes that overlap with running experiments.
Include all assigned users, not just active ones. The control group should contain every user assigned to it, whether or not they were active during the experiment. Dropping inactive users creates a selection bias: the remaining "active" users in control might differ systematically from the remaining "active" users in treatment. This is the intention-to-treat principle inherited from randomized controlled trials.
Use the same metric definitions for both groups. If the treatment group's metric is computed slightly differently (say, with a different session definition or attribution window), the comparison is invalid. Confidence computes metrics identically for control and treatment from the same warehouse queries, eliminating this class of error.
When do you need a larger control group?
In a standard A/B test, a 50/50 split between control and treatment maximizes statistical power per user. But there are situations where you'd want an asymmetric split.
When multiple experiments run simultaneously, teams sometimes share a control group. If three experiments each test a different treatment against the same control, you save traffic by allocating more users to a single control group and fewer to each treatment. Spotify's Surface concept in Confidence coordinates this: when multiple teams experiment on the same product area, the platform manages shared control allocation so teams don't accidentally create overlapping experiments with different control definitions.
When the treatment carries risk, a 90/10 or 95/5 split limits exposure. The downside is reduced power: with only 5% of users in treatment, you need 20 times the total traffic to achieve the same power as a 50/50 split. This is the trade-off between safety and speed that the experiment design has to balance.