A split test is another name for an A/B test. The term emphasizes the mechanism: splitting traffic between two or more variants and comparing the outcomes. In practice, "split test," "A/B test," and "online controlled experiment" all describe the same thing: randomly assigning users to groups, showing each group a different experience, and measuring whether the difference in outcomes is statistically significant.
The term "split test" is more common in marketing and conversion rate optimization contexts, while "A/B test" dominates product engineering and "online controlled experiment" appears in academic literature. Confidence uses "A/B test" as the primary term across its platform and documentation, but if you're coming from a CRO or marketing background, the concepts are identical.
Why do different communities use different names?
The terminology split reflects different histories. Direct-response marketers were running split tests on mailing lists and landing pages long before tech companies formalized A/B testing in product development. In that world, the "split" in split testing referred literally to splitting a mailing list in half and sending each half a different version.
When web companies adopted the same methodology for product changes, "A/B test" became the standard term because it generalized beyond marketing. The "A" and "B" label the variants rather than describing the mechanism. Google, Microsoft, and Spotify all use "A/B test" in their engineering organizations. The academic community prefers "online controlled experiment" because it connects to the broader literature on randomized controlled trials and causal inference.
All three terms describe the same statistical design: randomize users into groups, vary one thing between the groups, measure the causal effect. If someone asks you to run a "split test," you're running an A/B test.
How does the traffic split work in practice?
The "split" in split testing is handled by a feature flag system. When a user hits the product, a hash function takes the user's ID and an experiment-specific salt and maps it to a variant. This assignment is deterministic: the same user always gets the same variant, with no state to store and no network call at evaluation time. Confidence's feature flags evaluate in-process at 10 to 50 microseconds.
The most common split ratio is 50/50: half the users see the control, half see the treatment. This maximizes statistical power for a given sample size. But split ratios can be adjusted. A 90/10 split reduces the risk exposure to an untested change at the cost of requiring more total traffic to reach the same power. Rollouts in Confidence start with small percentages (1%, 5%) and ramp up gradually, monitoring guardrail metrics at each stage.
The split ratio is not the same as the allocation percentage. You can allocate only 20% of your traffic to an experiment (leaving 80% unaffected) and then split that 20% evenly between control and treatment. This is useful when multiple teams run experiments simultaneously and need to share the user base without interfering with each other.