Culture & Organization

What is an Experiment Coordination?

Experiment coordination is the practice of managing interactions, priorities, and resource allocation across concurrent experiments.

Experiment coordination is the practice of managing interactions, priorities, and resource allocation across concurrent experiments. When multiple teams run experiments on the same product surface at the same time, their changes can interfere with each other, producing results that are difficult or impossible to interpret. Coordination prevents that interference without slowing teams down.

This becomes a real problem earlier than most organizations expect. You don't need Spotify-scale volume to hit it. Two teams running experiments on the same checkout flow, or the same homepage feed, can produce confounded results that lead to wrong decisions.

Why do concurrent experiments interfere?

The core issue is treatment interaction. If Team A changes the search ranking algorithm and Team B changes the search results UI in the same week, both experiments include users who are affected by both changes simultaneously. Team A's measured effect on click-through rate includes whatever Team B's UI change did to the same users, and vice versa.

In the simplest case, the interaction effects are small and roughly cancel out across experiments. But when experiments touch the same user journey or the same metric, interactions can be large enough to flip the sign of a result: a change that genuinely improves the experience appears to make it worse because a concurrent experiment is pulling the metric in the opposite direction.

The second issue is resource contention. Every experiment needs users. On a fixed-traffic product surface, running more experiments means either splitting traffic into smaller groups (reducing statistical power per experiment) or running experiments for longer (reducing throughput). Without coordination, teams compete for the same users without realizing it.

How does Spotify coordinate experiments at scale?

Spotify's approach uses a concept called Surfaces. A Surface represents a product area where experiments happen: the mobile home screen, search, the podcast experience. Each Surface has properties that make coordination structural rather than bureaucratic.

Standardized metrics. Every experiment on a Surface shares a set of required metrics. This means Team A and Team B are measuring the same things, which makes it possible to detect when one team's experiment is affecting another team's results.

Overlap prevention. The assignment system uses a bucket-reuse hashing mechanism that controls which users are eligible for which experiments. Experiments that need to be mutually exclusive (because they modify the same component) are assigned non-overlapping buckets. Experiments that are safe to overlap share the full user pool.

Shared coordination overhead. Instead of each team independently negotiating with every other team about who gets which users, the Surface provides the coordination layer. Teams register their experiments with the Surface, and the system handles allocation.

The Spotify Home team provides a concrete example of what this looks like at maturity. 58 teams ran 520 experiments on the mobile home screen in 2025, averaging 10 new experiments every week. That volume is only possible because the coordination is structural, built into the platform, rather than relying on meetings and spreadsheets.

What happens without coordination?

The failure modes are predictable. Results become unreliable because interaction effects aren't accounted for. Teams lose trust in experimentation because "the numbers don't make sense." Coordination becomes bureaucratic: a review board decides who can run experiments when, creating a bottleneck that throttles experiment bandwidth. The organization's experiment velocity drops to whatever the review board can process.

At many companies, the response to coordination problems is to serialize experiments: only one experiment at a time per product area. This eliminates interference but destroys throughput. If each experiment runs for two weeks and you can only run one at a time, you get 26 experiments per year on that surface. Spotify runs 520 on one surface alone.

How does Confidence handle coordination?

Confidence's Surface concept brings the coordination approach Spotify built internally to every team on the platform. Surfaces group experiments by product area, enforce shared metric sets, and manage user allocation so teams can run concurrently without interfering. The bucket-reuse assignment system guarantees valid statistical inference even when experiments share users, and flags situations where exclusive assignment is required.

For teams just starting out, coordination may not be the immediate problem. But as experiment volume grows, the coordination question becomes unavoidable. Building on a platform that handles it structurally means you don't have to solve it yourself when the time comes.