Lesson 6: Interpretability

Summary

In this lesson, you learn how to create metrics that stakeholders can understand and act on. You explore clear naming conventions, documentation best practices, and techniques for communicating metric meaning to both technical and non-technical audiences.

Why interpretability matters

A metric is only useful if people can understand what it means and what actions to take when it moves. The most sophisticated metric in the world won't drive good decisions if stakeholders can't interpret it.

When a metric changes, interpretability means everyone understands what user behavior changed, whether the change is good or bad, what might have caused it, and what actions to consider next. Without this shared understanding, you can't make confident decisions.

Compare "adjusted engagement index" to "average session duration per active user per week." The first leaves people guessing—what's being adjusted? What counts as engagement? How do I interpret a 5% increase? The second is clear: you're measuring how long active users spend in sessions each week. If it goes up, users are spending more time. If it goes down, they're spending less.

Metric name format

Metric names should be descriptive and self-explanatory. A good metric name answers three questions: What are you measuring, for whom, and over what time period?

The "what" is the behavior or outcome: purchases, clicks, conversions, completed tasks. The "who" is the unit of analysis: per user, per session, per account. The "when" is the time window: daily, in the first week, during the trial period. Put these together and you get names like "purchase completion rate per user in first 30 days" or "average pages viewed per session for returning visitors."

Example

Building interpretable names:

Vague: "Engagement metric v2" What's being measured? What does v2 mean? Impossible to interpret.

Clear: "Average time spent per active user per week" Immediately clear: you're measuring how much time active users spend weekly.

Vague: "W1RR for MAU" Abbreviations create barriers, especially for new team members.

Clear: "Week 1 retention rate for monthly active users" Anyone can understand this without a glossary.

Write helpful descriptions

Every metric should have complete technical documentation—calculation logic, time windows, filters, and more. But your colleagues shouldn't need to look it up every time they see your metric. A well-written description saves everyone time.

Think of the description as your chance to help colleagues quickly understand what the metric measures and why it matters. When someone is choosing metrics for an experiment or reviewing results, they can read your description and immediately know if this metric is relevant, without diving into the full definition.

In Confidence

In Confidence, every metric has a dedicated definition page with all the technical details. The description field is what colleagues see without clicking through, so it's the first thing they use to assess relevance.

Example

Metric name: Purchase completion rate per user in first 30 days

Weak description: "Measures purchases"

This forces colleagues to click through to understand what's actually being measured.

Strong description: "The percentage of new users who complete at least one purchase within 30 days of signing up. Used to measure onboarding effectiveness and early conversion."

This gives colleagues enough context to decide if the metric is relevant without needing to dig deeper.

A good description answers three questions: What behavior are you measuring? Why does it matter? When would someone use this metric? The time you invest in writing a clear description pays off many times over as colleagues reuse your metric.

This matters even more as AI agents become part of experiment workflows. An agent can read the full metric definition, but when there are thousands of metrics to search through, a well-written description is far more useful than a definition: it lets the agent find the right metric by understanding its purpose rather than parsing its implementation.

Good descriptions are also the foundation of a centralized metric documentation system. When every metric has a clear definition, consistent calculation logic, and well-written context, the whole organization can share a single source of truth. Teams can discover existing metrics rather than recreating them, and when business needs change, the logic only needs to be updated in one place.

In Confidence

In Confidence, every metric you define is available to your entire organization—so the quality of your documentation directly affects how confidently your colleagues can reuse your work.

Audience communication

Different stakeholders need different levels of detail. Engineers need technical specifics: data sources, joins, edge cases, implementation notes. Product managers need to understand what the metric measures, why it matters for the product, and how to interpret changes. Executives need business impact, connections to strategic goals, and what actions to consider.

The same metric can be explained three different ways. For an engineer: "We calculate monthly active users as distinct user IDs with at least one logged event in the trailing 30-day window, excluding test accounts and automated traffic." For a product manager: "Monthly active users tells us how many unique people used the product in the last month. It's our primary measure of active user base size." For an executive: "Monthly active users grew 3% this quarter, driven by strong growth in emerging markets and improved retention in the free tier."

Technical accuracy and comprehension

Sometimes the most technically accurate description is too complex for broad communication. When this happens, use simple language for general communication, provide technical details in documentation, and highlight any important caveats.

Example

What engineers see internally: The calculation involves log-transformation, variance reduction using CUPED, and handles edge cases for sessions under 10 seconds.

What the metric is named: "Average session length"

The principle: Metric names describe the behavior being measured. Statistical implementation details—transformations, variance reduction methods, edge case handling—belong in the technical documentation, not in the name that stakeholders use to understand results.

Lesson 6: Interpretability

Why interpretability matters

Metric name format

Write helpful descriptions

Audience communication

Technical accuracy and comprehension

Which metric name best follows the interpretability principles taught in this lesson?

What is the primary purpose of writing a strong metric description in Confidence?

Why should metric names avoid including statistical implementation details like CUPED adjustments?

A metric called 'support ticket volume' increases by 20%. Why is this hard to interpret?