Lesson 1: What is a metric?
In this lesson, you learn the fundamental definition of a metric and how metrics are created from individual events and measurements. You also explore the many ways metrics are used across organizations to drive product decisions, from performance tracking to experimentation.
Why metrics matter
Every day, product teams make decisions that affect millions of users. Should you launch that new feature? Is the redesign working? Which version of the experience is better? These questions can't be answered with intuition alone. You need evidence, and that evidence comes from metrics.
Metrics transform raw user behavior into actionable insights. They tell you whether your product is healthy, whether your changes are working, and where to focus your efforts.
The challenge isn't just measuring things; it's measuring the right things in the right way. This course teaches you how to think about metrics: what they are, how to choose them, and how to use them effectively.
From events and measurements to metrics
At its core, a metric is a number that represents a specific aspect of a system you want to observe and understand. More formally, a metric is an aggregation across users or sessions that provides insight into user behavior, product performance, or business health.
To build metrics, you start with raw data about what users do. This data comes in two forms: events and measurements. An event captures that something happened: a song was streamed, a button was clicked, a purchase was completed. A measurement captures a quantity with a scale or unit: 3.5 minutes of audio consumed, $47.99 in order value, 1,250 bytes downloaded.
Imagine you want to understand how much users engage with Spotify. When someone plays a song, that's an event. The minutes they spent listening is a measurement. User A streamed a song at 9:00 AM (event) and listened for 3.2 minutes (measurement). User B streamed a song at 9:15 AM (event) and listened for 4.7 minutes (measurement). User A streamed another song at 9:30 AM (event) and listened for 2.8 minutes (measurement).
These individual events and measurements don't tell you much on their own. You can't look at millions of stream events and understand engagement patterns. You need to aggregate them across users into something more meaningful.
That's where metrics come in. When you aggregate events—counting streams per user—you create metrics about frequency. When you aggregate measurements—summing listening minutes per user—you create metrics about quantity. "Average streams per user" tells you about typical engagement levels. "Total minutes listened per user" tells you about consumption depth. "Share of users who streamed daily" tells you about habit formation.
The aggregation step across users or sessions is what transforms raw data into insight. Individual events and measurements are just data points. Metrics are the lens through which you understand those data points and make decisions based on them.
The key distinction: Events capture that something happened. Measurements capture quantities with scale. Metrics are aggregations of events or measurements across users that provide actionable insights.
Metric types
Three metric types come up repeatedly in experimentation, and understanding the differences shapes how you measure, analyze, and interpret your data.
Continuous metrics measure quantities that can vary across a wide range. When you count streams per user, sum up total listening time, or calculate average session length, you're working with continuous metrics. These metrics capture "how much" or "how many"—they answer questions about quantity, frequency, and magnitude.
The power of continuous metrics lies in their granularity. They don't just tell you whether something happened—they tell you the degree to which it happened. A user who streams 100 songs per week is having a very different experience from a user who streams 5, and continuous metrics capture that difference.
Continuous metrics in practice:
For a streaming platform like Spotify, continuous metrics might include total streams per user over the past week, or minutes of content played per day. Each user contributes a number that could range from zero to hundreds or thousands.
For an e-commerce site, you might measure average order value or items purchased per month. One customer might buy a single low-cost item while another makes large, multi-item purchases—the continuous metric captures that full range.
For a SaaS product, you might track features used per session or API calls per customer, revealing how deeply different users engage with your platform.
Binary metrics take a different approach. Instead of measuring how much, they measure whether. Did the user stream this week or not? Did they complete a purchase or abandon their cart? Did they activate the new feature or ignore it? The answer is always yes or no, true or false, one or zero.
When you aggregate binary metrics across many users, you get proportions and rates. The share of users who streamed becomes your weekly active user rate. The share of free users who upgraded becomes your conversion rate. These proportions are powerful because they're easy to interpret and directly tied to user behavior milestones.
Binary metrics in practice:
For Spotify, you might measure whether each user streamed at least once this week (creating a weekly active user metric), or whether they have a premium subscription (creating a premium subscriber rate).
For e-commerce, the classic binary metric is conversion: did the user complete a purchase during this session? Averaged across all sessions, this becomes your conversion rate.
For a SaaS product, you might track whether each customer activated a specific feature within their first week, or whether they're currently on a paid plan versus a free tier.
Ratio metrics
Beyond continuous and binary metrics, there's a third important type: ratio metrics. These metrics express a relationship between two quantities: clicks per impression, streams per session, revenue per order, or conversion rate per visit.
Ratio metrics are powerful because they normalize for opportunity. "Total clicks" might increase simply because you showed more content, but "clicks per impression" reveals whether users actually engaged more with what they saw. This normalization makes ratios particularly useful for comparing experiences where exposure varies.
Ratio metrics in practice:
For Spotify, you might measure "streams per session" rather than just "total streams." This accounts for differences in how often users open the app and focuses on engagement depth within each visit.
For an e-commerce site, "add-to-cart rate" (carts per product view) is more informative than "total add-to-carts" because it controls for how much browsing happened.
For advertising, "click-through rate" (clicks per impression) is the standard metric because it normalizes for how many ads were shown.
Analysis unit versus randomization unit
Ratio metrics introduce an important complexity: the analysis unit (what appears in the numerator or denominator) may differ from the randomization unit (how you assigned users to treatment groups).
Consider an experiment randomized at the user level measuring "clicks per impression." You assigned users to treatment groups, but the metric is computed per impression. Each user contributes multiple impressions, creating a mismatch between randomization and analysis units.
This matters because the core problem is that impressions from the same user are correlated—they share that user's preferences, tendencies, and session context. Standard statistical methods assume observations are independent. When impressions are naively treated as independent observations, this inflates your effective sample size, producing overconfident confidence intervals and falsely small p-values. The apparent weighting imbalance between high-volume and low-volume users is a symptom of this; the underlying cause is the violated independence assumption.
When the analysis unit differs from the randomization unit, you need specialized statistical methods. Understanding the distinction helps you interpret results correctly and avoid common pitfalls in metric design.
Confidence handles the unit mismatch automatically, applying the delta method when your metric's analysis unit differs from the randomization unit.
The unit mismatch problem:
You're testing a new ad format, randomized by user. Your metric is "click-through rate" (clicks per impression).
User A sees 100 impressions and clicks 5 times (5% CTR). User B sees 10 impressions and clicks 1 time (10% CTR).
Naively treating each impression as an independent observation ignores that User A's 100 impressions are all correlated with each other — they're from the same person. The result is an inflated effective sample size, which makes the experiment look more powered than it is and produces overconfident confidence intervals. The right approach is to analyze at the randomization unit (user), not the analysis unit (impression).
For experimentation, you typically want user-level averages because that's your randomization unit. Understanding this distinction helps you interpret what your ratio metric actually measures.
What distinguishes events and measurements from metrics?
Which of the following is a binary metric?
Why do ratio metrics require special statistical treatment compared to continuous metrics?
Notes for nerds
The "continuous" label is a simplification. Strictly speaking, metrics like streams per user are discrete counts, not truly continuous values. Count metrics and genuinely continuous measurements are grouped together here because from a statistical perspective—for mean-difference estimators—they require the same treatment. The meaningful distinction for analysis purposes is whether the outcome is numeric (count or continuous) or binary.
Metric types and statistical methods. Continuous, binary, and ratio are often presented as three distinct metric types in the experimentation literature, but they are not mutually exclusive categories. A metric can be both a ratio and produce a binary-style proportion (click-through rate, for instance, is clicks ÷ impressions and yields a value between 0 and 1). The reason these three are treated separately in practice is that they require different statistical methods: continuous/count metrics use standard mean-difference estimators; binary metrics use proportion-difference estimators; and ratio metrics require the delta method because their numerator and denominator each carry independent variance.
The delta method and unit mismatch. The unit mismatch problem described in this lesson—where the analysis unit (impression) differs from the randomization unit (user)—requires the delta method to compute variance correctly. Treating each impression as an independent observation ignores the correlation between impressions from the same user, which inflates your sample size estimates and produces overconfident results.
The delta method approximates the variance of a ratio metric X/Y using a first-order Taylor expansion around the means μ_X and μ_Y. This is the standard approach in large-scale experimentation platforms.
For a thorough treatment of the delta method applied to online experimentation metrics, see Deng, A., Lu, J., & Wang, S. (2018). "Applying the Delta Method in Metric Analytics: A Practical Guide with Novel Ideas." Proceedings of KDD 2018.
Ratio metrics and variance reduction. Ratio metrics don't just complicate variance estimation—they also make variance reduction more involved. When you apply regression adjustment (like CUPED) to a ratio metric, you can't simply adjust the ratio directly; you need to account for the joint behavior of the numerator and denominator. Lesson 8: Variance reduction covers this in depth.