- can have multiple treatments, which is sometimes referred to as an A/B/n test
- use both success and guardrail metrics to identify experiences that improve some metrics without negatively impacting others
- let you learn and find promising ideas
- have a fixed allocation that doesn’t change
- can use either a fixed or sequential design, where you view results upon conclusion or continuously during the experiment
Most A/B tests aim to test product changes with the goal of understanding whether you should roll
out the changes, or if they need further development.
A learning experiment is another type of A/B test that aims to learn about user behavior or to
measure a strategic baseline for the product.
This learning is typically achieved by removing a product or feature from the experience or degrading the
experience in some other way. Such a test helps inform future product prioritization
by breaking down which parts of the existing product have the most impact on
user behavior or the business.
Learning experiments can also be exploratory and only aim to find if a certain variant has a causal relation to an outcome
regardless of direction.
The Anatomy of an Experiment
An A/B test has different parts. This section gives a high-level overview of these concepts.The Hypothesis is the Product Foundation of the Test
A hypothesis is a specific assumption that can be conclusively tested when subjected to an experiment, and is the basis for a good experiment. It guides the experiment from a product perspective, and makes the anticipated impact and value of the experiment clear.A/B Tests Distribute Different Experiences Through Variants
An A/B test evaluates how users react after exposure to a new experience. Variants describe the different user experiences you test. For example, there could be different variants of a button color. One variant sets the button color to red, another to blue. A variant in an experiment is often referred to as a treatment. These variants often introduce new features, innovations, or changes that should improve the experience for the user. Typically, an experiment has one variant representing the current default (in production) experience, usually called control or the control treatment.Randomization Makes Differences Causal
Users in an experiment are randomly assigned a variant. The variant is the only difference in the experience between the control and treatment groups. Because of randomization, the different treatments explain any observed change in behavior. If the treatment group outperforms the control group on the target metric, the treatment variant improves the user experience. Randomization ensures that the groups are similar. External factors, such as seasonality, other feature launches, and competitor moves, affect control and treatment evenly and have no impact on the results of the experiment.The treatment effect estimated in an A/B test is only valid for the time of the test.
The estimated effect doesn’t necessarily generalize to other future points in time.
The same treatment can have a widely different impact depending on when you run the test.
For example, recommending Christmas songs in July might not have the same effect as in December.
The randomization only ensures that the groups are similar during the experiment.
Metrics Measure the Effect of the Treatments
Every A/B test needs at least one metric. These metrics help prove or disprove the hypothesis and to make a business decision based on the outcome of the test. In other words, your metrics help answer whether the change is good enough to release widely. Confidence supports two types of metrics:- Success metrics are metrics that should improve with the treatment
- Guardrail metrics are metrics that don’t need to improve, but shouldn’t deteriorate
It’s common and strongly recommended to use both success and guardrail metrics.
The reason is to guard against, for example, cannibalization.
An experiment may want to increase engagement in a new feature, but not by cannibalizing the engagement in another feature.
In this case, the engagement in the new feature would be the success metric, while the engagement in the related feature is
the guardrail metric.
Statistical Analysis Tells the Answer
Experimentation uses statistical analysis to reach a conclusion. A statistical test is a formal procedure used to assess whether the observed difference between two groups is sufficiently large to say that there is an effect. The goal of the statistical test is to distinguish the actual effect of treatment from that due to noise from random sampling. The statistical tests analyze each metric, and ultimately summarize the results using a recommendation for the product decision.Roll Out a Successful Experiment
Convert the A/B test to a rollout when you complete the A/B test and you have a winning variant. The rollout targets the exact same users, with all the metrics and configuration from the A/B test. You can scale up to more users if the A/B test used less than 100% of the allocation. To avoid reassigning users, the control and treatment groups must remain at the same proportions. For example, suppose an A/B test was running at 10% of the population but had a 50/50 split of control and treatment. When you increase the rollout percentage to 50%, all users are in either control or treatment. You can’t continue to track metrics when you increase the rollout percentage beyond this point. Read more about rolloutsExperiment Lifecycle
An A/B test moves through different states during its lifecycle. Each state has specific actions available.| State | Available actions |
|---|---|
| Draft | Launch, Archive, Delete, Clone |
| Live | Roll out, End, Clone |
| Ended | Archive, Clone |
Clone an Experiment
You can clone any A/B test to create a new draft with the same configuration. Clone an experiment to run a similar test without configuring it from scratch. To clone an experiment, open the A/B test detail page and select Clone from the top of the page. The cloned experiment starts as a new draft that you can change before launch.Planned and Actual Runtime
The Planning section on the A/B test detail page shows information about the experiment’s runtime:- Planned runtime: The expected duration of the experiment. Click Edit to set or update the planned runtime. For draft experiments, planned runtime shows “Not set” until you configure it.
- Actual runtime: The time the experiment has been running, calculated automatically from the launch and end dates.

