Hypothesis

You should start the design of an experiment by specifying a clear hypothesis that can be quantitatively tested to inform your decision-making.

How to Set Up a Hypothesis

A well-formulated hypothesis is a specific assumption that can be conclusively tested through an experiment. Not all hypotheses are equally effective. An effective hypothesis should be:

a statement, not a question
clear about what experiment outcomes would support or weaken it
clear about the key variables
grounded in past research/learnings
written with as few assumptions as possible

It’s also important to consider what decision you make based on the experiment results. By articulating the decision you want to make, you ensure that your hypothesis statement reflects the intentions of your team and drives actionable outcomes. To convert your idea into a testable hypothesis, use the following template as a starting point:

Doing this/building this feature/creating this experience for these people/personas should result in a change in their behavior, as measured by success metrics. The data supports the hypothesis if the success metrics change by the minimum detectable effect.

You can read more about minimum detectable effects (MDE) on the effect sizes page.

Example Hypothesis

Imagine your team is building an autoplay feature for the Spotify mobile app. Your team’s goals are:

Make it easier for people to continue listening when their content ends.
Lead users to listen to more content curated by Spotify.

The autoplay feature depends on Radio, which is one type of curated content. There happens to be a company objective to increase the percentage of content hours curated by Spotify, so your team decides to choose this as the success metric for your experiment. Your team would decide to roll out this feature if it increases the success metric without harming some general guardrail metrics. Here’s an example hypothesis statement:

Continuing to play music or podcasts when a play context ends for all users should result in users listening to more of Spotify’s curated content rather than searching for something else to play themselves, as measured by percent programmed content. The data supports the hypothesis if the change in percent programmed content increases by 2.5pp.

Composite Hypothesis

Many experiments use one or two success metrics and a few guardrail metrics. In this scenario you should write a hypothesis statement for each success metric, while for the guardrails it’s generally enough to just state the hypothesis that the treatment does not deteriorate the guardrail metrics more than the acceptable margins (known as non-inferiority margins). Consider the earlier example of the Autoplay experiment. The guardrail metrics are the skip rate of programmed content and the app crash rate. To also include the guardrail metrics, change the hypothesis statement as follows:

Continuing to play music or podcasts when a play context ends for all users should result in users listening to more of Spotify’s curated content rather than searching for something else to play themselves, as measured by percent programmed content. The data supports the hypothesis if the percent programmed content increases by 2.5pp, while the app crash rate and the programmed content skip rate don’t increase by more than the acceptable margins.

The hypothesis statements for the success metrics intend to capture a change in user behavior that is measurable by some metric. For guardrail metrics, expect no change, or only a small one. In settings like this, you need to define the decision rule for a successful experiment upfront. For example, if the treatment significantly improves one success metric, but there is no evidence of non-inferiority on the guardrail metrics, should you ship this variant or not? Have you found enough evidence that this variant is better than the current default version? Read more about the decision rules.

Effect Sizes

Configure MDE and NIM settings

Metrics in Experiments

Configure success and guardrail metrics

Analyze Results

Understand decision rules

Launch an A/B Test

Run your experiment

Get Started

Quickstarts

How-To Guides

About

Warehouse Setup

Reference

How to Set Up a Hypothesis

Example Hypothesis

Composite Hypothesis

Effect Sizes

Metrics in Experiments

Analyze Results

Launch an A/B Test

Get Started

Quickstarts

How-To Guides

About

Warehouse Setup

Reference

​How to Set Up a Hypothesis

​Example Hypothesis

​Composite Hypothesis

​Related Resources

Effect Sizes

Metrics in Experiments

Analyze Results

Launch an A/B Test

How to Set Up a Hypothesis

Example Hypothesis

Composite Hypothesis

Related Resources