A primary metric is the main metric used to decide whether to ship an experiment's treatment to all users. It's the metric that answers the experiment's core question: did this change make the product better in the way we hypothesized? In practice, "primary metric" and "success metric" are often used interchangeably. The distinction, where one exists, is emphasis: "primary" highlights that this is the single most important metric among potentially several success metrics.
When an experiment tracks multiple success metrics, the primary metric is the one that gets the final word. If the primary metric shows a statistically significant improvement and the other success metrics are neutral, the team ships. If the other success metrics improve but the primary is flat, the result is ambiguous at best.
Why does designating a primary metric matter?
Pre-registering a primary metric before the experiment starts is what separates rigorous experimentation from post-hoc storytelling.
Without a designated primary metric, teams face a temptation after results come in: pick whichever metric moved and call that the success. This is a form of multiple testing without correction. If you track ten metrics and declare victory on whichever one hits significance, you have roughly a 40% chance of a false positive even when the change had no real effect. The Confidence blog on multiple testing corrections covers this in detail.
Confidence's experiment setup requires teams to specify their metric roles before the experiment starts: which metrics are success metrics, which are guardrails, which are secondary. The primary metric is the one the power analysis is designed around. Sample size calculations, experiment duration, and the minimum detectable effect all flow from what you need to detect on the primary metric specifically.
How do you choose the right primary metric?
The primary metric should be the one closest to the hypothesis. If you believe a change to the checkout flow will increase completed purchases, your primary metric is completed purchases, not page views, not button clicks, not time on page. Those can be secondary metrics that help you understand the mechanism, but the primary metric measures the outcome you predicted.
Three practical checks help.
Can you detect a meaningful change? If the primary metric has high variance or a low base rate, you may need more traffic than you have. Confidence's power analysis tells you whether your experiment is feasible for a given metric and minimum detectable effect before you start.
Does it move within your experiment window? A metric that takes 60 days to reflect a change isn't a practical primary metric for a two-week experiment. Use a validated proxy if the real outcome is too slow.
Is it aligned with what the business actually values? A primary metric that improves but doesn't connect to revenue, retention, or user satisfaction is measuring motion, not progress.