Lesson 3: How to measure impact with success and guardrail metrics
Use success metrics to capture what you want to improve. Use guardrail metrics to capture what you don't want to affect negatively.
When you run an experiment, such as an A/B test or a rollout, the ultimate goal is to learn about the impact of the change you made. To know what the impact is, you need to measure the outcome on a relevant set of metrics. The metrics you select can serve different purposes, and even be subject to different statistical tests. This page describes the two main types of metrics you can use to measure impact, and how to select them.
The two types of metrics are:
- Success metrics. Metrics that you aim to improve with your change.
- Guardrail metrics. Metrics that you don't expect to improve, but that you want to make sure you don't have a negative impact on.
Success and guardrail metrics
Success metrics are the metrics that you aim to improve with your change. They're what you use to prove that your change had a positive impact.
In companion to success metrics, you should also select guardrail metrics. Guardrail metrics are metrics that help you make sure that your change doesn't have a negative impact on other aspects of your product. This means a hypothesis for an experiment includes two criteria: one for the success metric and one for the guardrail metric. Both criteria need evidence to support the decision to launch the change.
Let's look at some examples of success and guardrail metrics.
Example: Checkout flow
You run an A/B test with an improvement to the checkout flow of your e-commerce website. Your goal is to make the checkout flow more efficient so that your visitors spend less time in the checkout flow. You want to make sure that the improvement in the checkout flow doesn't come at the expense of the number of purchases.
- Success metric: Average time to completed checkout per visitor.
- Guardrail metric: Number of purchases per visitor.
Example: Search algorithm
With your new Spotify search algorithm, your hypothesis is that users get better podcast recommendations. You want to measure the impact of the new algorithm on the consumption of podcasts. You want to make sure that the new algorithm change doesn't reduce the number of users that listen to music.
- Success metric: The average number of podcast minutes played per user.
- Guardrail metric : The average number of music minutes played per user.
Example: Dating app
You have a dating app that requires new users to complete their profile before they can interact with others. You run a test where your hypothesis is that, showing a dialog with advice for how to onboard, increases the number of users that complete their setup. The dialog shown in your dating app experiment uses new technology, and you want to make sure you don't introduce any bugs.
- Success metric: Share of users that complete their profile setup.
- Guardrail metric: Number of crashes per user.
When you start out with experimentation, it is a good idea just to select some guardrail metrics and start experimenting. After you got the hang of it, you can make guardrail metrics even more valuable to your decision making by specifying so-called non-inferior margins (NIM). Learn more about the tests used for guardrail metrics in this Advance your experimentation course lesson.
You are trying to increase the engagement in your product with a new variant that is aiming to increase the engagement in a certain view, say A. To ensure that an increase in engagement in view A doesn't come at the expense of engagement in a competing view B. You should use the engagement in view A as the success metric and the engagement in view B as the guardrail metric. If the new variant increases the engagement in view A and doesn't decrease the engagement in view B, the variant is successful and should be shipped.
How success and guardrail metrics together define a successful variant
For a change to be worth shipping, at least one success metric must have improved significantly, while all guardrail metrics must not have regressed beyond acceptable limits. Both conditions must hold: a win on the success metric doesn't override a failure on a guardrail.
Confidence uses both success and guardrail metrics to identify a successful variant in A/B tests. Read more about how success and guardrail metrics feed into the overall recommendation for a decision.