Confidence
  • Documentation
  • Blog
  • Bootcamp
  • Status
  • Confidence Bootcamp
    • My learning
    • Intro to experimentation
      • Introduction
      • Lesson 1: Why you should experiment
      • Lesson 2: Experiment hypothesis
      • Lesson 3: Success and guardrail metrics
      • Lesson 4: Success metrics
      • Lesson 5: Set up your experiment
      • Lesson 6: Calculation frequency
      • Lesson 7: Target audience
      • Lesson 8: Sample size
      • Lesson 9: Quality assurance
      • Lesson 10: Run your experiment
      • Lesson 11: Evaluate your experiment and make a decision
      • Lesson 12: A/B tests and rollouts
      • Course wrap up
    • Intro to metrics
      • Introduction
      • Lesson 1: What is a metric?
      • Lesson 2: Metric roles
      • Lesson 3: Time considerations
      • Lesson 4: Capturing behavior
      • Lesson 5: Strategic metrics
      • Lesson 6: Interpretability
      • Lesson 7: Feasibility and sensitivity
      • Lesson 8: Variance reduction
      • Lesson 9: Select metrics
      • Lesson 10: Segment-level analysis
      • Course wrap up
    • Scientific product development
      • Introduction
      • Lesson 1: Why you should experiment
      • Lesson 2: The scientific method
      • Lesson 3: Randomized controlled trials
      • Lesson 4: Experiment hypothesis
      • Lesson 5: Case study
        • Case study
        • Answers to case study
      • Lesson 6: Why do we need statistics?
      • Lesson 7: Success metrics
      • Lesson 8: Detectable effects and sample size
      • Lesson 9: Make a decision
      • Course wrap up
    • A primer on hypothesis testing
      • Introduction
      • Lesson 1: Introduction to hypothesis testing
      • Lesson 2: True vs estimated effects
      • Lesson 3: Sampling distribution of the difference-in-means estimator
      • Lesson 4: Z-tests and how to reject the null hypothesis
      • Lesson 5: False postive rate and alpha
      • Lesson 6: True positive rate, MDE, and power
      • Course wrap up
    • Intro to Feature Flags
      • Introduction
      • Lesson 1: What is a feature flag?
      • Lesson 2: Lifecycle of a feature flag
      • Lesson 3: Clients
      • Lesson 4: Evaluation context and targeting
    • Sample size calculation - I
      • Introduction
      • Lesson 1: What is the required sample size?
      • Lesson 2: Alpha and power
      • Lesson 3: Baseline mean and variance
      • Lesson 4: Sample size playground - I
    • Sample size calculation - II
      • Introduction
      • Lesson 1: Multi-metric decision making
      • Lesson 2: Number of success metrics
      • Lesson 3: Number of guardrail metrics
      • Lesson 4: Number of comparisons
      • Lesson 5: Sample size playground - II
    • Sample size calculation - III
      • Introduction
      • Lesson 1: Binary metrics
      • Lesson 2: Treatment group proportions
      • Lesson 3: Variance reduction
      • Lesson 4: Sequential testing and sample size
      • Lesson 5: Sample size playground - III
    • Advance your experimentation
      • Introduction
      • Lesson 1: Guardrail metrics with non-inferiority margins
      • Lesson 2: Choose evaluation frequency
      • Lesson 3: Metrics' roles in experiments
      • Lesson 4: Cumulative holdback evaluations
    • Experimentation culture
      • Introduction
      • Lesson 1: Onboarding into experimentation
      • Lesson 2: Empowering experimentation champions
      • Lesson 3: Sustaining the experimentation culture
    • Videos

Lesson 3: Why you need randomized controlled experiments

Summary

In this lesson, you learn about the role of randomization in experiments. Randomized experiments are also known as randomized controlled trials.

Randomized treatment assignment:

  • Makes treatment groups similar in all aspects besides which treatment they receive.
  • Makes it possible to interpret the treatment effect as the causal effect of the treatment on the outcome.

Example: The effectiveness of a weight-loss program

Let's imagine that you own a gym, and you want to offer a weight-loss program to your customers. You want to know how effective your program is, so you design an experiment.

Experiment design 1: Uncontrolled trial

You stand at the entrance of your gym and look for volunteers to participate in your program so you can measure its effectiveness. You weight the people that want to join your program before they enroll. After a 6-week program, consisting of exercise schedules and dietary advice, you weigh them again. You calculate the average difference in weight before and after the program, and find that people lost 3 kg on average. You celebrate a great success and start advertising your program!

The problem with an uncontrolled trial

You don't know what would have happened if people didn't enroll in your program. All participants wanted to lose weight, and maybe they would have done so without your program. People's weight fluctuates over time. People who had just gained some weight (for example after holidays) may be more motivated to sign-up. They may also just lose weight again just because they returned to their normal lifestyle. If you want to know the effectiveness of your program, then you need to compare your program with a situation without it.

Experiment design 2: You need a control group

You stand at the entrance of your gym and look for volunteers to participate in your program so you can measure its effectiveness. You enroll the people that want to join. You ask the people that don't want to participate to be part of a control group. You weigh both groups before and after the program. After the program, you calculate the change in weight before and after for each group, and then calculate the difference between both groups. You find that our treatment group lost more weight than the control group! You celebrate a great success and start advertising our program!

The problem with observational control groups

Our treatment and control groups are not comparable. The treatment group wanted to lose weight, and the control group didn't. The control group may contain people who joined the gym to become stronger, and may have even gained weight from growing muscles! This mechanism is called selection bias, and happens when groups are selected in a way that biases the result. Selection bias causes incomparable groups and invalidates any result. To avoid selection bias, you need a method of assigning people to the treatment and control groups, that can't have any correlation with the outcome that you plan to measure.

Experiment design 3: Randomized controlled trial

You want to get a precise estimate of the effectiveness of your weight loss program. For this, you need to compare people who took the program to a control group that is comparable in all other relevant aspects, except for the fact that they have taken the program. For this example, you could do the following instead. You again stand at the entrance of your gym and ask people if they are interested in participating in your weight-loss program. If they say "no", then they don't participate in the trial. If they say "yes", you weigh them and then flip a coin. Based on the coin flip you either enroll them right away, or you tell them "The program starts in 6 weeks, come back then!"". This way you remove any selection bias. The random assignment makes sure that on average, the groups are similar across all other characteristics except for the treatment that you give them. Of course, you still need to make sure that you can collect the data from everybody in the treatment and the control group after 6 weeks!

Randomized controlled trials

Experiments, like A/B tests and rollouts, split users into two (or more) groups by random assignment. The random assignment makes sure that the groups are, on average, similar in all aspects except for the change you want to test. For example, if you randomly split all Spotify users into two groups, the two groups should be very similar in terms of dimensions like demographics, connection speed, and music taste. One group gets the status of a "treatment" group and receives the new feature. The other group receives the default feature. you can then observe the users over time while they receive two different experiences, and measure some outcome of interest, for example churn, daily activity, or the number of minutes played. At the end of the experiment, you run a statistical test to calculate whether the differences between the groups are larger than what you expect to see if there's no difference.

Randomized controlled trial
Reader exercise

What is the purpose of randomizing the treatment assignment in a controlled experiment?

Was this page helpful?

PreviousLesson 2: The scientific method
NextLesson 4: Experiment hypothesis

© Copyright 2026. All rights reserved.

Follow us on TwitterFollow us on GitHub

On this page

  1. Example: The effectiveness of a weight-loss program

  2. Randomized controlled trials