Confidence
  • Documentation
  • Blog
  • Bootcamp
  • Status
  • Confidence Bootcamp
    • My learning
    • Intro to experimentation
      • Introduction
      • Lesson 1: Why you should experiment
      • Lesson 2: Experiment hypothesis
      • Lesson 3: Success and guardrail metrics
      • Lesson 4: Success metrics
      • Lesson 5: Set up your experiment
      • Lesson 6: Calculation frequency
      • Lesson 7: Target audience
      • Lesson 8: Sample size
      • Lesson 9: Quality assurance
      • Lesson 10: Run your experiment
      • Lesson 11: Evaluate your experiment and make a decision
      • Lesson 12: A/B tests and rollouts
      • Course wrap up
    • Intro to metrics
      • Introduction
      • Lesson 1: What is a metric?
      • Lesson 2: Metric roles
      • Lesson 3: Time considerations
      • Lesson 4: Capturing behavior
      • Lesson 5: Strategic metrics
      • Lesson 6: Interpretability
      • Lesson 7: Feasibility and sensitivity
      • Lesson 8: Variance reduction
      • Lesson 9: Select metrics
      • Lesson 10: Segment-level analysis
      • Course wrap up
    • Scientific product development
      • Introduction
      • Lesson 1: Why you should experiment
      • Lesson 2: The scientific method
      • Lesson 3: Randomized controlled trials
      • Lesson 4: Experiment hypothesis
      • Lesson 5: Case study
        • Case study
        • Answers to case study
      • Lesson 6: Why do we need statistics?
      • Lesson 7: Success metrics
      • Lesson 8: Detectable effects and sample size
      • Lesson 9: Make a decision
      • Course wrap up
    • A primer on hypothesis testing
      • Introduction
      • Lesson 1: Introduction to hypothesis testing
      • Lesson 2: True vs estimated effects
      • Lesson 3: Sampling distribution of the difference-in-means estimator
      • Lesson 4: Z-tests and how to reject the null hypothesis
      • Lesson 5: False postive rate and alpha
      • Lesson 6: True positive rate, MDE, and power
      • Course wrap up
    • Intro to Feature Flags
      • Introduction
      • Lesson 1: What is a feature flag?
      • Lesson 2: Lifecycle of a feature flag
      • Lesson 3: Clients
      • Lesson 4: Evaluation context and targeting
    • Sample size calculation - I
      • Introduction
      • Lesson 1: What is the required sample size?
      • Lesson 2: Alpha and power
      • Lesson 3: Baseline mean and variance
      • Lesson 4: Sample size playground - I
    • Sample size calculation - II
      • Introduction
      • Lesson 1: Multi-metric decision making
      • Lesson 2: Number of success metrics
      • Lesson 3: Number of guardrail metrics
      • Lesson 4: Number of comparisons
      • Lesson 5: Sample size playground - II
    • Sample size calculation - III
      • Introduction
      • Lesson 1: Binary metrics
      • Lesson 2: Treatment group proportions
      • Lesson 3: Variance reduction
      • Lesson 4: Sequential testing and sample size
      • Lesson 5: Sample size playground - III
    • Advance your experimentation
      • Introduction
      • Lesson 1: Guardrail metrics with non-inferiority margins
      • Lesson 2: Choose evaluation frequency
      • Lesson 3: Metrics' roles in experiments
      • Lesson 4: Cumulative holdback evaluations
    • Experimentation culture
      • Introduction
      • Lesson 1: Onboarding into experimentation
      • Lesson 2: Empowering experimentation champions
      • Lesson 3: Sustaining the experimentation culture
    • Videos

Lesson 5: Case study: Shuffle button in a shelf on Spotify Home

Write a hypothesis for an experiment that adds a shuffle button to a shelf on Spotify Home

This case study is based on an actual experiment at Spotify. The details have been modified for the purpose of this exercise. The experiment adds a shuffle button to the 'Try something else' shelf on the home screen of Spotify.

The text below gives some information on earlier research, theory, and motivation that led to an experiment which tested the impact of adding a shuffle button on the 'Try something else' shelf on the home screen.

Use the information below to formulate a testable hypothesis for this experiment.

Prior knowledge

  • In earlier user research, we found that Just play something actions decreased the perceived friction to start listening.
  • A large part of the consumption from Spotify is via surfaces that automatically select what audio to play.
  • Anecdotal evidence that the stress of having to choose can explain why users continue to consume the same content.

Theory about user need

Providing the user with low-effort paths in terms of making a decision can decrease the load on the user to start listening, and thereby increase engagement and audio consumption.

Motivation

  • If users get stuck in browsing for content, and are unable to make a decision, they might leave the app.
  • Offering a low-effort path to start listening will decrease perceived stress and can increase engagement and audio consumption from the home screen.
  • Similar features have been successful in other places in the app before.

Exercise: Write a testable hypothesis for this experiment

  1. Which success metrics would you set for this experiment? Why would you choose these metrics?

  2. What unintended side effects could you see when you add a shuffle button to this shelf? What guardrail metrics would you set to test for this?

  3. Based on the information above, write a testable hypothesis for an experiment on a shuffle button on the Try something else shelf. You can use the template below as a starting point.

You can find answers to these questions on the next page.

Hypothesis template

Based on [prior knowledge], we believe that [theory about user need].

We think that [doing this/building this feature/creating this experience] for [these people/personas] will achieve [these outcomes].

We will know this is true when we see [metric results]. This will be good for [customers/artists/ our business] because [motivation].

Bonus questions

  1. How much of an increase in these metrics would you want to see to call the experiment a success?
  2. What is the largest decrease in the guardrail metrics that you would still consider acceptable?

You may not be able to give an exact number for questions 4) and 5), but think about how you would come up with a number to answer these questions.

Reader exercise

What is the primary reason for selecting 'Minutes played on week 1 after exposure' as a success metric instead of 'Consumption from the shelf with the shuffle button'?

Was this page helpful?

PreviousLesson 4: Experiment hypothesis
NextAnswers to case study

© Copyright 2026. All rights reserved.

Follow us on TwitterFollow us on GitHub

On this page

  1. Write a hypothesis for an experiment that adds a shuffle button to a shelf on Spotify Home

  2. Exercise: Write a testable hypothesis for this experiment

  3. Bonus questions