Core Experimentation

What is a Product Builder?

A product builder is anyone who builds and ships product: engineers, product managers, designers, data scientists, and the increasingly blended roles between them.

A product builder is anyone who builds and ships product: engineers, product managers, designers, data scientists, and the increasingly blended roles between them. They're the people who decide what to change, write the code, design the experience, define the metrics, and make the call on whether to ship. In the context of experimentation, product builders are both the primary users and the primary beneficiaries of the evidence that experiments produce.

The term matters because experimentation tools have historically been built for specialists. The analyst ran the query. The data scientist checked the statistics. The engineer implemented the flag. The PM waited for results. That division made sense when experiments were rare and required custom statistical work. It doesn't hold when experimentation is the default way product changes get made. At Spotify, where 300+ teams run 10,000+ experiments per year, the product builder is the experimenter. Confidence is designed for that reality.

Why does experimentation need to work for product builders?

The alternative is that experimentation remains gated by specialists. Every experiment requires an analyst to set it up, a data scientist to validate the results, and a meeting to interpret the findings. That workflow creates a bottleneck that limits how many experiments the organization can run.

The math is straightforward. If you have 10 analysts supporting 50 product teams, each analyst supports 5 teams. If each experiment requires 4 hours of analyst time and each team wants to run 3 experiments per sprint, you need 600 hours of analyst time per sprint. You have 400. Two out of every five experiments don't happen.

The fix is making the experimentation platform do the work that analysts currently do by hand: computing metrics, running statistical tests, applying corrections, surfacing guardrail violations, and presenting results in a form that a PM with basic analytics knowledge can interpret. Confidence automates the statistical analysis, runs it inside the team's data warehouse, and applies the right defaults (CUPED, sequential testing, Bonferroni correction for success metrics) without requiring the product builder to configure them.

This doesn't eliminate the need for specialists. Complex experimental designs, custom estimands, and novel statistical questions still require expertise. But the 80% of experiments that are standard A/B tests with standard metrics shouldn't require specialist involvement.

How does the product builder role change with experimentation maturity?

In organizations new to experimentation, the product builder's relationship with experiments is transactional. They request an experiment, someone else sets it up, someone else analyzes it, and the product builder receives a recommendation.

In mature experimentation organizations, the product builder owns the full cycle. They formulate the hypothesis, configure the experiment, monitor the results, and make the ship decision. The platform supports them at every step but doesn't require them to hand off to a specialist.

Spotify's published writing on this progression is instructive. The Search team's experimentation maturity arc started with improving individual experiment quality (better hypotheses, better metric selection), then moved to cross-experiment coordination, then to measuring cumulative business impact with holdouts. At each stage, the product builder took on more ownership of the experimental process, and the platform removed more of the manual work that would have required specialist support.

The long-term outcome: product builders develop better intuition. Spotify's Experiments with Learning framework shows that across thousands of experiments, the win rate is ~12% but the learning rate is ~64%. Product builders who regularly run experiments and engage with the results (including the negative and null results) develop sharper product judgment. Their next hypothesis is better because the last ten experiments calibrated their understanding of what users actually respond to.

What should a product builder know about experimentation?

Three things matter most.

How to write a testable hypothesis. "We think X will improve Y because Z" forces clarity about what you expect and why. Without it, a positive result is ambiguous and a negative result is uninterpretable.

How to read experiment results. Understanding confidence intervals, what statistical significance means (and what it doesn't), and the difference between "no effect detected" and "no effect exists." Confidence presents results to make this interpretation straightforward, but the builder needs to know what to look for.

When to trust the result and when to dig deeper. A result that contradicts your hypothesis isn't automatically wrong, and a result that confirms it isn't automatically right. Product builders who treat experiment results as one input alongside their domain knowledge, user research, and strategic context make better decisions than those who treat them as a verdict.