Who is the Confidence Bootcamp for?

The bootcamp is designed for anyone who wants to improve their experimentation skills. Courses are tailored for data scientists, analysts, engineers, product managers, and leaders — whether you are running your first A/B test or scaling an experimentation program across your organization.

Is the bootcamp free?

Yes, the Confidence Bootcamp is completely free. All 11 courses, 90+ lessons, and resources are available at no cost. You can start learning immediately without creating an account, though signing in lets you track your progress across devices.

The bootcamp covers the full experimentation lifecycle: A/B testing fundamentals, hypothesis formulation, interpreting experiment results, metrics design, sample size calculation, feature flags, and building an experimentation culture. It includes 11 courses with over 90 lessons built by the Confidence team at Spotify.

How long does the bootcamp take to complete?

The full bootcamp takes approximately 20 hours to complete across all 11 courses. Individual courses range from 30 minutes to 3 hours. You can learn at your own pace and pick the courses most relevant to your role.

Do I need prior experience with A/B testing or statistics?

No prior experience is required. The bootcamp starts with foundational courses like Intro to Experimentation and progressively covers more advanced topics like sequential testing and variance reduction. Each course clearly indicates which roles it is designed for.

Who created the Confidence Bootcamp?

The Confidence Bootcamp was created by the Confidence team at Spotify, the same team that builds the experimentation and feature flagging platform used across Spotify. The content reflects real-world experimentation practices used at one of the world's largest digital products.

Lesson 2: Treatment group proportions

Summary

This lesson explains how the relative sizes of treatment groups affect the required sample size in experiments. If the total sample size is fixed, it's in most cases optimal to have equal group sizes. However, a larger total sample size is always better.

Group size and power

When discussing required sample size, it is common to refer to a single number: "the sample size." However, this total sample size is actually a combination of the sample size in the control group and the sample size in the treatment group.

Interestingly, the total required sample size is not fixed if we change the relative sizes of the treatment groups. This is intuitive: imagine you have a sample of 100 users. When do you learn the most about the treatment effect? If the groups are split 50/50 or 99/1? If only one user is in the treatment group, it will not provide much information about the treatment effect.

Fixed sample size: Equal group sizes maximize power

For continuous metrics, it is optimal from a power perspective to have equal group sizes. For binary metrics, the optimal group sizes depend on the Minimum Detectable Effect (MDE) and the baseline proportion. For the optimal group sizes to deviate from equal, the baseline proportion and the proportion under the hypothetical treatment effect must differ a lot. In other words, unless the MDE is very large, it is a good general rule to aim for similar group sizes. If you are interested in the mathematical details, see the derivation of optimal treatment group sizes in the Note for nerds section below.

Recommendation

If the total possible sample size is fixed, it is a good general rule to aim for similar group sizes.

Larger total sample size is always better

It's important to realize that:

For a fixed total sample size, it is optimal to have similar group sizes to maximize power.
It is always better to have a larger total sample size.

This also means that if you have a fixed number of users that can be exposed to the treatment (for example, due to legal or budget constraints), the larger the control group, the better. In other words, if the size of one group is fixed for some reason, increasing the size of the other group will always improve power.

This is because we want to minimize the uncertainty of the mean for both the treatment and control groups to accurately estimate the treatment effect.

Risky treatments

If a treatment is risky, you might want to limit the number of users exposed to this treatment for risk mitigation purposes. In such cases, the power can be improved by increasing the size of the control group. Sometimes for risky treatments at Spotify, the treatment group is fixed to a small size, and the control group is increased to as large as possible to maximize power. This requires some fiddling in practice since the allocation of the population and treatment proportions are both relative.

For example, if the population is 1000 users and you want 30 to be exposed to the treatment. Then you could have any number up to 970 in the control group. You could run a 50/50 split on 6% of the population to have 30 in each group, or a 97/3 split on 100% of the population to have 970 in the control group and 30 in the treatment group.

Note for nerds

It's in fact quite straightforward to derive the optimal group sizes for binary and continuous metrics. If calculus is not your thing, feel free to skip this section.

Binary metrics

Let's derive the optimal proportions for binary metrics step by step:

Initial setup

Let $N_a$ and $N_b$ be the sample sizes of two treatment groups, and $p_a$ and $p_b$ be the baseline proportion and the proportion under the hypothetical treatment effect. Define $\kappa = N_b / N_a$ , where $\kappa > 0$ . For simplicity, let $v_j = p_j(1 - p_j)$ for $j \in \{a, b\}$ .

Step 1: Express total sample size

The minimum required sample size for given type-I and type-II risks is found by solving:

$\argmin_{\kappa} N = \argmin_{\kappa} \left(\left(\frac{Z_{\alpha}+Z_{\beta}}{p_a-p_b}\right)^2 \times (v_a / \kappa + v_b) + \left(\frac{Z_{\alpha}+Z_{\beta}}{p_a-p_b}\right)^2 \times (v_a + v_b \kappa)\right)$

This expands to:

$\argmin_{\kappa} N = \argmin_{\kappa} \left(\left(\frac{Z_{\alpha}+Z_{\beta}}{p_a-p_b}\right)^2 v_a / \kappa + \left(\frac{Z_{\alpha}+Z_{\beta}}{p_a-p_b}\right)^2 v_b + \left(\frac{Z_{\alpha}+Z_{\beta}}{p_a-p_b}\right)^2 v_a + \left(\frac{Z_{\alpha}+Z_{\beta}}{p_a-p_b}\right)^2 v_b \kappa\right)$

Step 2: Take derivative

Taking the derivative with respect to $\kappa$ :

$\frac{\partial}{\partial \kappa} = -\left(\frac{Z_{\alpha}+Z_{\beta}}{p_a-p_b}\right)^2 v_a / \kappa^2 + \left(\frac{Z_{\alpha}+Z_{\beta}}{p_a-p_b}\right)^2 v_b$

Step 3: Set to zero and solve

Setting to zero:

$-\left(\frac{Z_{\alpha}+Z_{\beta}}{p_a-p_b}\right)^2 v_a / \kappa^2 + \left(\frac{Z_{\alpha}+Z_{\beta}}{p_a-p_b}\right)^2 v_b = 0$

$\left(\frac{Z_{\alpha}+Z_{\beta}}{p_a-p_b}\right)^2 v_a / \kappa^2 = \left(\frac{Z_{\alpha}+Z_{\beta}}{p_a-p_b}\right)^2 v_b$

$v_a / \kappa^2 = v_b$

$v_a / v_b = \kappa^2$

$\kappa = \sqrt{v_a / v_b} = \sqrt{\frac{p_a(1-p_a)}{p_b(1-p_b)}}$

This implies that for a baseline proportion $p_a$ and a hypothetical treatment group proportion $p_b$ , it is optimal to have:

$N_b = N_a \sqrt{\frac{p_a(1-p_a)}{p_b(1-p_b)}}$

Clearly, if $p_a \approx p_b$ , then $N_a$ is close to $N_b$ , which makes the rule of keeping the groups similar a good general guideline. For the nerds who paid attention in the previous lesson, this also of course implies that for binary guardrail metrics, the optimal group sizes are equal.

Continuous metrics

For continuous metrics, let's derive the optimal group sizes step by step:

Initial setup

Let $m_a$ and $m_b$ be the mean of two groups on some continuous metric. Since the variance doesn't depend on the treatment effect, our optimization simplifies.

Step 1: Express total sample size

We want to minimize:

$\argmin_{\kappa} N = \argmin_{\kappa} \left(\left(\frac{Z_{\alpha}+Z_{\beta}}{m_a-m_b}\right)^2 \times \sigma^2 (1 + 1 / \kappa) + \left(\frac{Z_{\alpha}+Z_{\beta}}{m_a-m_b}\right)^2 \times \sigma^2 (\kappa + 1)\right)$

This expands to:

$\argmin_{\kappa} N = \argmin_{\kappa} \left(\left(\frac{Z_{\alpha}+Z_{\beta}}{m_a-m_b}\right)^2 \sigma^2 + \left(\frac{Z_{\alpha}+Z_{\beta}}{m_a-m_b}\right)^2 \sigma^2 / \kappa + \left(\frac{Z_{\alpha}+Z_{\beta}}{m_a-m_b}\right)^2 \sigma^2 \kappa + \left(\frac{Z_{\alpha}+Z_{\beta}}{m_a-m_b}\right)^2 \sigma^2\right)$

Step 2: Take derivative

Taking the derivative with respect to $\kappa$ :

$\frac{\partial}{\partial \kappa} = -\left(\frac{Z_{\alpha}+Z_{\beta}}{m_a-m_b}\right)^2 \sigma^2 / \kappa^2 + \left(\frac{Z_{\alpha}+Z_{\beta}}{m_a-m_b}\right)^2 \sigma^2$

Step 3: Set to zero and solve

Setting to zero:

$\left(\frac{Z_{\alpha}+Z_{\beta}}{m_a-m_b}\right)^2 \sigma^2 / \kappa^2 = \left(\frac{Z_{\alpha}+Z_{\beta}}{m_a-m_b}\right)^2 \sigma^2$

$\sigma^2 / \kappa^2 = \sigma^2$

$\kappa^2 = \frac{\sigma^2}{\sigma^2}$

$\kappa = 1$

This implies that it is optimal to have equal group sizes ( $N_a = N_b$ ).

Summary

For binary metrics, the treatment effect impacts the variance, so the optimal group sizes depend on the baseline proportion and the MDE.
For continuous metrics, equal group sizes are always optimal.
In all cases, a larger total sample size will improve power.

Lesson 2: Treatment group proportions

Group size and power

Fixed sample size: Equal group sizes maximize power

Larger total sample size is always better

Risky treatments

What is the optimal group size allocation for continuous metrics when the total sample size is fixed?

Why is it better to have a larger total sample size in experiments?

What happens to the required sample size if the relative sizes of the treatment groups are uneven?

Note for nerds

Binary metrics

Initial setup

Step 1: Express total sample size

Step 2: Take derivative

Step 3: Set to zero and solve

Continuous metrics

Initial setup

Step 1: Express total sample size

Step 2: Take derivative

Step 3: Set to zero and solve

Summary

Lesson 2: Treatment group proportions

Group size and power

Fixed sample size: Equal group sizes maximize power

Larger total sample size is always better

Risky treatments

What is the optimal group size allocation for continuous metrics when the total sample size is fixed?

Why is it better to have a larger total sample size in experiments?

What happens to the required sample size if the relative sizes of the treatment groups are uneven?

Note for nerds

Binary metrics

Initial setup

Step 1: Express total sample size

Step 2: Take derivative

Step 3: Set to zero and solve

Continuous metrics

Initial setup

Step 1: Express total sample size

Step 2: Take derivative

Step 3: Set to zero and solve

Summary