Lesson 7: Feasibility and sensitivity

Feasibility: Can we measure it?

Feasibility means you have the data, infrastructure, and resources to compute the metric reliably. Before committing to a metric, verify: data availability (events are logged at the right granularity with sufficient history), technical capability (you can compute it efficiently within required time frames), and sample size requirements (you can collect enough data at acceptable cost).

For example, "streams per user from playlist recommendations" is feasible with logged events and user-level data. In contrast, "user satisfaction with recommendation quality" requires expensive survey data that may not exist.

Sensitivity: Can we detect changes?

A metric is sensitive if it reliably detects meaningful changes. Sensitivity depends on two independent factors: variance (how much noise exists) and influenceability (whether your change can actually move the metric). Both must be favorable.

Variance: Raw versus effective

Higher variance requires more users or longer runtime to detect a given effect size. However, what matters is not raw variance but effective variance after applying variance reduction techniques. The next lesson covers variance reduction in detail—for now, understand that the right covariate can dramatically change which metric is actually the most sensitive choice.

This distinction is critical. Variance reduction techniques use regression adjustment to control for pre-experiment user behavior. These techniques (like CUPED, CUPAC, and others) leverage temporal correlation: how stable the metric is over time. A metric with high raw variance but strong temporal correlation can end up with lower effective variance than a metric with moderate raw variance but weak correlation.

Binary versus continuous metrics

Converting continuous metrics to binary (changing "total streams" to "did user stream more than 10 times?") has critical trade-offs:

Binary metrics can have lower variance when users cluster far from the threshold, but they lose information about magnitude. Say you convert "streams per user" into a binary metric: "did the user stream more than 10 times?" Then a change from 15 to 50 streams produces the same binary outcome (1→1) as no change at all—both users were already above the threshold. You only detect crossings of that threshold, making binary metrics less responsive to changes tested in the experiment. Substantial behavior improvements may not register. Additionally, variance is highest when proportions are near 50%, so poor thresholds can increase variance.

The techniques used to reduce effective variance—regression adjustment (CUPED, CUPAC), capping, and related methods—are covered in depth in Lesson 8: Variance reduction.

Influenceability: Can your change move the metric?

Even with perfect variance properties, a metric is useless if your product change cannot move it. Influenceability measures whether your experiment can actually affect the metric—this is completely independent of variance.

Influenceability depends on three factors: Exposure (what proportion of users encounter the change?), Mechanism (does the metric measure behavior the change can affect?), and Effect size (is the change substantial enough to impact the metric?).

Scoping your experiment population and choosing metrics aligned with your intervention are crucial—perfect variance properties mean nothing if your change can't move the metric.

Trade-offs in practice

You've now seen that good metrics require feasibility, low effective variance, high influenceability, interpretability, and business alignment. No single metric optimizes all dimensions—you must make deliberate trade-offs:

Sensitivity versus business alignment: Metrics directly tied to business value (revenue, long-term retention) often have high variance or long measurement windows. Use sensitive proxies (engagement, short-term behavior) as primary metrics while monitoring business metrics secondarily. Ask: do improvements in my sensitive metric translate to business outcomes?

Granularity versus variance: Granular metrics (total revenue per user) capture effect magnitude but have high variance. Before simplifying through capping or binary conversion, check whether regression adjustment can give you both granularity and sensitivity.

Scope versus influenceability: Broad metrics (platform retention) align with business goals but resist movement from single experiments. Narrow metrics (feature engagement) are movable by experiments but miss broader impacts. Solution: use narrow, movable primary metrics for decisions while monitoring broad metrics for unintended effects.

Notes for nerds

Ratio metrics and variance reduction. Ratio metrics require special care not just for variance estimation, but also for variance reduction. When applying regression adjustment to a ratio metric, you can't treat the ratio as a simple scalar—the numerator and denominator each carry independent variance, and the covariance between them matters. This is covered in depth in Lesson 8: Variance reduction.

Heterogeneous treatment effects. Influenceability describes the average effect of your change across the measured population—but that average can mask enormous variation. A feature might be highly influenceable for power users who interact frequently with the affected surface, while being essentially unmovable for casual users who rarely encounter it. Conversely, a change can produce a neutral average because it helps some segments while harming others, with the two effects cancelling out. Whether this matters depends on your product strategy. If your goal is aggregate improvement, the average effect is what counts. If you want to understand who benefits and why, average influenceability is not enough—segment-level analysis is the right tool, and it is covered in Lesson 10: Segment-level analysis.