Core Experimentation

What is a Multivariate Testing?

A multivariate test (MVT) is a randomized experiment that changes multiple variables simultaneously and measures both the individual effect of each variable and the interaction effects between them.

A multivariate test (MVT) is a randomized experiment that changes multiple variables simultaneously and measures both the individual effect of each variable and the interaction effects between them. Where an A/B test asks "does this one change improve the metric?", an MVT asks "which combination of changes produces the best outcome, and do these changes amplify or cancel each other?"

MVTs are common in marketing and web optimization contexts where teams want to test several page elements at once: headline copy, button color, hero image, and layout. In product experimentation, they're less common because the traffic requirements grow fast. If you're testing 3 variables with 2 options each, you need 8 variant combinations (2 x 2 x 2). Each combination needs enough users to produce a statistically powered result, so you're effectively running 8 parallel experiments. That's why Spotify's 300+ teams predominantly use A/B and A/B/n tests for product decisions and reserve MVTs for situations where understanding interactions is the primary goal.

How does a multivariate test work?

An MVT works by creating every possible combination of the variables being tested. Suppose you're testing two variables: a new checkout button (original vs. redesigned) and a new pricing display (original vs. simplified). That gives you four combinations:

  1. Original button + original pricing (control)
  2. Redesigned button + original pricing
  3. Original button + simplified pricing
  4. Redesigned button + simplified pricing

Users are randomly assigned to one of the four combinations. After the experiment runs, the analysis decomposes the results into main effects (the individual contribution of each variable) and interaction effects (how the variables influence each other). An interaction effect means the impact of one change depends on whether the other change is also present.

The analysis typically uses factorial designs from classical experimental design theory. Confidence supports multi-variant experiments and can be configured to test these factorial structures, though for most product experimentation the simpler A/B or A/B/n design is the better default.

When should you use an MVT instead of an A/B test?

Use an MVT when two conditions are both true: you have enough traffic to power all the variant combinations, and you genuinely need to understand how changes interact.

If you only care about which individual change is best, run separate A/B tests or a single A/B/n test. A/B tests are simpler to design, analyze, and communicate to stakeholders. They require less traffic and produce clearer decisions.

MVTs earn their complexity when interaction effects matter. For example, if a new headline only works when paired with a specific call-to-action, an A/B test on each element separately would miss that interaction and could lead you to ship the headline without the call-to-action, producing no improvement. An MVT would detect that the headline's effect depends on the call-to-action.

In practice, interaction effects in product experiments are smaller and rarer than most teams expect. The Spotify Search team's experimentation practice, documented in their journey toward better experimentation practices, focuses on isolating individual changes precisely rather than testing combinations. Most product decisions don't require understanding interactions. They require understanding whether a single change helps or hurts.

What are the limitations of multivariate testing?

Traffic requirements grow multiplicatively. Each additional variable doubles the number of combinations (assuming two levels per variable). Three variables with two levels each: 8 combinations. Four variables with three levels each: 81 combinations. Most product teams don't have the traffic to power experiments with dozens of cells.

Results are harder to interpret. A/B test results are straightforward: the treatment is better, worse, or neutral. MVT results include main effects and interaction terms. Communicating "variable A helps, but only when variable B is also changed" to a product team requires more statistical literacy than most organizations have.

The experiment takes longer. Because traffic is split across more cells, reaching adequate statistical power takes proportionally longer. An A/B test that would run for two weeks might need six to eight weeks as an MVT with eight combinations, assuming the same total traffic.

For these reasons, Confidence is optimized for the A/B and A/B/n patterns that work for the vast majority of product experimentation. When you do need a factorial design, the platform supports it, but the default recommendation is to test one thing at a time and iterate.