Lesson 1: What is a metric?

Why metrics matter

Every day, product teams make decisions that affect millions of users. Should you launch that new feature? Is the redesign working? Which version of the experience is better? These questions can't be answered with intuition alone. You need evidence, and that evidence comes from metrics.

Metrics transform raw user behavior into actionable insights. They tell you whether your product is healthy, whether your changes are working, and where to focus your efforts.

The challenge isn't just measuring things; it's measuring the right things in the right way. This course teaches you how to think about metrics: what they are, how to choose them, and how to use them effectively.

From events and measurements to metrics

At its core, a metric is a number that represents a specific aspect of a system you want to observe and understand. More formally, a metric is an aggregation across users or sessions that provides insight into user behavior, product performance, or business health.

To build metrics, you start with raw data about what users do. This data comes in two forms: events and measurements. An event captures that something happened: a song was streamed, a button was clicked, a purchase was completed. A measurement captures a quantity with a scale or unit: 3.5 minutes of audio consumed, $47.99 in order value, 1,250 bytes downloaded.

Imagine you want to understand how much users engage with Spotify. When someone plays a song, that's an event. The minutes they spent listening is a measurement. User A streamed a song at 9:00 AM (event) and listened for 3.2 minutes (measurement). User B streamed a song at 9:15 AM (event) and listened for 4.7 minutes (measurement). User A streamed another song at 9:30 AM (event) and listened for 2.8 minutes (measurement).

These individual events and measurements don't tell you much on their own. You can't look at millions of stream events and understand engagement patterns. You need to aggregate them across users into something more meaningful.

That's where metrics come in. When you aggregate events—counting streams per user—you create metrics about frequency. When you aggregate measurements—summing listening minutes per user—you create metrics about quantity. "Average streams per user" tells you about typical engagement levels. "Total minutes listened per user" tells you about consumption depth. "Share of users who streamed daily" tells you about habit formation.

The aggregation step across users or sessions is what transforms raw data into insight. Individual events and measurements are just data points. Metrics are the lens through which you understand those data points and make decisions based on them.

Metric types

Three metric types come up repeatedly in experimentation, and understanding the differences shapes how you measure, analyze, and interpret your data.

Continuous metrics measure quantities that can vary across a wide range. When you count streams per user, sum up total listening time, or calculate average session length, you're working with continuous metrics. These metrics capture "how much" or "how many"—they answer questions about quantity, frequency, and magnitude.

The power of continuous metrics lies in their granularity. They don't just tell you whether something happened—they tell you the degree to which it happened. A user who streams 100 songs per week is having a very different experience from a user who streams 5, and continuous metrics capture that difference.

Binary metrics take a different approach. Instead of measuring how much, they measure whether. Did the user stream this week or not? Did they complete a purchase or abandon their cart? Did they activate the new feature or ignore it? The answer is always yes or no, true or false, one or zero.

When you aggregate binary metrics across many users, you get proportions and rates. The share of users who streamed becomes your weekly active user rate. The share of free users who upgraded becomes your conversion rate. These proportions are powerful because they're easy to interpret and directly tied to user behavior milestones.

Ratio metrics

Beyond continuous and binary metrics, there's a third important type: ratio metrics. These metrics express a relationship between two quantities: clicks per impression, streams per session, revenue per order, or conversion rate per visit.

Ratio metrics are powerful because they normalize for opportunity. "Total clicks" might increase simply because you showed more content, but "clicks per impression" reveals whether users actually engaged more with what they saw. This normalization makes ratios particularly useful for comparing experiences where exposure varies.

Analysis unit versus randomization unit

Ratio metrics introduce an important complexity: the analysis unit (what appears in the numerator or denominator) may differ from the randomization unit (how you assigned users to treatment groups).

Consider an experiment randomized at the user level measuring "clicks per impression." You assigned users to treatment groups, but the metric is computed per impression. Each user contributes multiple impressions, creating a mismatch between randomization and analysis units.

This matters because the core problem is that impressions from the same user are correlated—they share that user's preferences, tendencies, and session context. Standard statistical methods assume observations are independent. When impressions are naively treated as independent observations, this inflates your effective sample size, producing overconfident confidence intervals and falsely small p-values. The apparent weighting imbalance between high-volume and low-volume users is a symptom of this; the underlying cause is the violated independence assumption.

Notes for nerds

The "continuous" label is a simplification. Strictly speaking, metrics like streams per user are discrete counts, not truly continuous values. Count metrics and genuinely continuous measurements are grouped together here because from a statistical perspective—for mean-difference estimators—they require the same treatment. The meaningful distinction for analysis purposes is whether the outcome is numeric (count or continuous) or binary.

Metric types and statistical methods. Continuous, binary, and ratio are often presented as three distinct metric types in the experimentation literature, but they are not mutually exclusive categories. A metric can be both a ratio and produce a binary-style proportion (click-through rate, for instance, is clicks ÷ impressions and yields a value between 0 and 1). The reason these three are treated separately in practice is that they require different statistical methods: continuous/count metrics use standard mean-difference estimators; binary metrics use proportion-difference estimators; and ratio metrics require the delta method because their numerator and denominator each carry independent variance.

The delta method and unit mismatch. The unit mismatch problem described in this lesson—where the analysis unit (impression) differs from the randomization unit (user)—requires the delta method to compute variance correctly. Treating each impression as an independent observation ignores the correlation between impressions from the same user, which inflates your sample size estimates and produces overconfident results.

The delta method approximates the variance of a ratio metric X/Y using a first-order Taylor expansion around the means μ_X and μ_Y. This is the standard approach in large-scale experimentation platforms.

For a thorough treatment of the delta method applied to online experimentation metrics, see Deng, A., Lu, J., & Wang, S. (2018). "Applying the Delta Method in Metric Analytics: A Practical Guide with Novel Ideas." Proceedings of KDD 2018.

Ratio metrics and variance reduction. Ratio metrics don't just complicate variance estimation—they also make variance reduction more involved. When you apply regression adjustment (like CUPED) to a ratio metric, you can't simply adjust the ratio directly; you need to account for the joint behavior of the numerator and denominator. Lesson 8: Variance reduction covers this in depth.