A north star metric is the single metric that best captures the value a product delivers to its users. It represents the core outcome your product exists to create: the thing that, if it goes up, means users are genuinely getting more value. For Spotify, that might be time spent listening. For a marketplace, it might be completed transactions. For a collaboration tool, it might be active projects per team.
The north star metric aligns the entire organization around a shared definition of progress. When teams disagree about priorities, the north star provides a common reference point. When an experiment produces ambiguous results across multiple metrics, the north star helps resolve which direction is genuinely better for users.
How does a north star metric relate to experiment metrics?
A north star metric rarely works well as a direct success metric in individual experiments. The reasons are practical.
North star metrics tend to be high-level and slow-moving. A change to one feature in a large product may genuinely improve user value, but the effect on a product-wide north star metric will be tiny and drowned out by variance from everything else happening in the product. An experiment would need an impractically large sample size to detect a statistically significant change.
Instead, teams use the north star as a compass for choosing what to optimize and proxy metrics or feature-level success metrics in the actual experiment. A team working on Spotify's search experience doesn't run experiments with "total listening hours" as the success metric. They use search-specific metrics (queries that lead to plays, time to first play) that they've validated as contributing to the broader outcome the north star represents.
The discipline is in the validation. Every proxy and feature-level metric should have a documented relationship to the north star. If a team can't articulate how improving their success metric contributes to the north star, the experiment might be optimizing something that doesn't matter.
What makes a good north star metric?
It measures value delivered, not activity captured. "Daily active users" counts who shows up. A good north star captures whether they got what they came for. "Sessions per user" measures frequency. "Successful task completions per user" measures whether the product is doing its job.
It's broad enough to reflect the whole product. If the metric only covers one feature or one user segment, it's a feature metric, not a north star. The north star should move when any part of the product gets meaningfully better or worse.
It's specific enough to be actionable. "User happiness" isn't a metric. "Weekly hours of content consumed" is. The north star needs to be something you can measure, decompose into contributing factors, and trace back to specific product decisions.
It's resistant to gaming. Because the north star influences priorities across the organization, it will be optimized for. If it can be inflated by tactics that don't create real value (the same failure mode that plagues proxy metrics), it will be. The best north star metrics measure outcomes that are hard to fake.
Can a north star metric change?
Yes, and it should when the product's value proposition evolves. A startup that initially measures "users who complete onboarding" as their north star will outgrow that metric once onboarding is no longer the binding constraint on growth. The north star should reflect the current strategic question: what is the most important dimension of value we need to grow right now?
Changing the north star is a significant organizational decision. It resets what teams prioritize, which experiments they run, and how they measure success. Confidence supports this by keeping metric definitions in your data warehouse, where they can be versioned and updated without losing historical experiment results computed under the previous definition.