The treatment group is the set of users in an experiment who see the changed experience. They're exposed to the new feature, the redesigned flow, or the modified algorithm that the experiment is testing. The difference in metrics between the treatment group and the control group is the treatment effect: the causal impact of the change.
In Confidence, the treatment group is defined by a feature flag variant. When the experiment starts, the flag's assignment logic routes a percentage of users to the treatment variant. Those users get the new experience. Everyone else gets the control (the current experience). The comparison between these two groups is what turns a product change from a guess into evidence. At Spotify, where 42% of experiments are rolled back after guardrail metrics detect regressions, the treatment group is where those regressions first become visible.
How are users assigned to the treatment group?
Assignment works through deterministic hashing. Confidence takes the user's ID and an experiment-specific salt, runs them through a hash function, and maps the output to a variant. If the hash falls within the treatment range (for example, the lower 50% of the hash space in a 50/50 split), the user sees the treatment.
This approach has three properties that matter for experiment validity.
Deterministic. The same user always gets the same assignment, across sessions, devices, and days. No state needs to be stored. Confidence's feature flags evaluate in-process at 10 to 50 microseconds with no network call.
Random. User IDs have no systematic relationship to the hash output, so the assignment is effectively random. The treatment group ends up statistically equivalent to the control group on every observable and unobservable characteristic.
Experiment-specific. The salt changes per experiment, so a user who's in the treatment group for one experiment might be in the control group for another. This prevents the same users from always being "early adopters" of every change, which would bias the measured treatment effects across the organization's experiment portfolio.
What is the treatment effect?
The treatment effect is the difference in a metric between the treatment and control groups. If the treatment group's conversion rate is 4.2% and the control group's is 4.0%, the estimated treatment effect is 0.2 percentage points.
The key word is "estimated." The observed difference includes both the true treatment effect and random noise. Statistical significance testing determines whether the observed difference is large enough, relative to the noise, to be unlikely under the assumption that the treatment had no effect (the null hypothesis).
In Confidence, the treatment effect is reported alongside a confidence interval that quantifies the uncertainty. A treatment effect of 0.2 percentage points with a 95% confidence interval of [0.05, 0.35] means you can be reasonably confident the true effect is positive. A treatment effect of 0.2 with a confidence interval of [-0.1, 0.5] means the data is consistent with no effect at all.
Can an experiment have more than one treatment group?
Yes. An A/B/n test has one control group and multiple treatment groups, each seeing a different variant. This is useful when you want to compare several alternatives simultaneously rather than testing them one at a time.
Each additional treatment group requires its own share of traffic. In a test with one control and three treatments, each group gets roughly 25% of users. The power per comparison drops accordingly, so you either need more traffic or a longer experiment to detect the same effect size.
Confidence supports A/B/n experiments natively and applies multiple testing corrections to account for the additional comparisons. Without these corrections, comparing three treatments to one control at a 5% significance level would produce a roughly 14% chance of at least one false positive.