A proxy metric is a measurable stand-in for a harder-to-measure outcome. When the outcome you care about takes too long to observe, is too noisy to detect in an experiment, or can't be measured directly, you use a proxy: a metric that correlates with the real outcome and can be measured within your experiment window. Clicks as a proxy for satisfaction, session length as a proxy for retention, add-to-cart rate as a proxy for lifetime value.
The risk with proxy metrics is well-documented: when you optimize for the proxy directly, you can destroy the relationship between the proxy and the outcome it was supposed to predict. The Confidence blog calls this the central failure mode of proxy-driven experimentation. A team that makes "time on page" their optimization target will eventually ship changes that increase time on page without increasing the satisfaction or engagement that time on page was supposed to measure.
Why do teams use proxy metrics in experiments?
Three practical constraints push teams toward proxies.
Observation windows. Many outcomes that matter (retention, lifetime value, long-term engagement) take weeks or months to materialize. An experiment that needs 90 days of post-exposure observation to measure its real effect consumes experiment bandwidth that most teams can't afford. A validated proxy that moves within days lets the same experiment produce a trustworthy result in a fraction of the time.
Statistical power. Long-term outcomes are often noisier than short-term proxies. Monthly retention has higher variance per user than daily session count, which means detecting the same underlying effect requires a larger sample. A lower-variance proxy metric can make an otherwise underpowered experiment feasible.
Measurability. Some outcomes are genuinely hard to measure. User satisfaction, perceived quality, long-term trust in a product. These are real things that matter, but they don't emit a clean signal you can track in an event log. Proxy metrics translate fuzzy outcomes into concrete numbers.
When does a proxy metric break?
The relationship between a proxy and its outcome is correlational, not causal. Optimizing the proxy doesn't cause the outcome to improve. It only works as long as the correlation holds, and direct optimization is the fastest way to break that correlation.
This happens through a mechanism called Goodhart's Law: when a measure becomes a target, it ceases to be a good measure. A product team that optimizes "number of shares" as a proxy for content quality will eventually find features that inflate shares (notification nudges, share-gate patterns, prominent share buttons) without improving the content. The proxy goes up. The outcome it was supposed to predict stays flat or declines.
The Spotify Search team's experimentation maturity arc illustrates the discipline required. Search teams don't just track click-through rate as a success metric; they pair it with downstream engagement metrics that verify whether users who clicked actually found what they were looking for. The proxy is validated against the outcome it represents, not treated as the outcome itself.
How should you validate a proxy metric?
Validation means establishing that changes which move the proxy also move the outcome, and that this relationship is stable over time.
Historical correlation. Look at past experiments where you measured both the proxy and the long-term outcome. Do experiments that improved the proxy also improve the outcome? If the correlation is weak or inconsistent, the proxy isn't reliable.
Holdout comparison. Run a long-term holdout that measures the real outcome over months. Compare it to the proxy-based decisions you've been making. If the holdout shows that proxy-positive experiments are outcome-neutral, the proxy has drifted.
Periodic re-validation. The proxy-outcome relationship can change as the product evolves. A proxy that was predictive six months ago may not be predictive today, especially if teams have been optimizing it directly. Confidence teams at Spotify re-validate proxy metrics as part of their regular experimentation practice, not as a one-time setup.
How does Confidence handle proxy metrics?
Confidence doesn't treat proxy metrics as a separate metric type in its analysis framework. A proxy can serve as a success metric, a guardrail metric, or a secondary metric depending on the experiment design. The statistical treatment follows from the metric's role in the decision, not from whether it's a proxy.
What Confidence does provide is the infrastructure to monitor both the proxy and the downstream outcome within the same experiment. Because analysis runs inside your data warehouse, you can define metrics on any data you have: short-term proxies, long-term outcomes, and the relationship between them. This makes validation practical rather than aspirational.