Blog

Cover illustration for accurate sample size calculation with always-valid inference

Accurate Sample Size for Always-Valid Inference

Most A/B testing tools either overestimate the sample size needed for always-valid inference or do not adjust for the sequential test in the sample size calculation at all. We derived a closed-form correction that requires no simulation.

Read article

July 14, 2026Mårten Schultzberg, Staff Data Scientist

When A/B tests tell you what you want to hear

A/B testing maturity has a dangerous middle phase: teams test, dislike the answer, and explain it away. How to spot it and learn to trust your data.

July 8, 2026Johan Rydberg, General Manager

When AI writes the code, who decides what ships?

AI-accelerated code production increases the need for experimentation. The validation bottleneck grows with build speed. The fastest learners will win.

June 8, 2026Johan Rydberg, General Manager

Cover illustration for sample size calculator best practices

What Makes a Good Sample Size Calculator?

A good sample size calculator must match your analysis: sequential testing, multiple metrics, and variance reduction all change the number it returns.

May 27, 2026Mårten Schultzberg, Staff Data Scientist

The Judgment Gap

AI made building cheap. It also made bad decisions cheaper to ship. The distance between execution speed and validation speed is the judgment gap.

May 20, 2026Johan Rydberg, General Manager

The Real ROI of Experimentation

The ROI of experimentation goes beyond counting winners: shipped wins, prevented regressions, and faster organizational learning all add measurable value.

May 19, 2026Johan Rydberg, General Manager

Confidence Bootcamp course cover showing the A/B testing curriculum

Spotify's Experimentation Bootcamp is now free: Introducing Confidence Bootcamp

The A/B testing curriculum Spotify built over ten years to train thousands of experimenters is now free and open to everyone.

April 27, 2026Mårten Schultzberg, Staff Data Scientist

Cover illustration showing why statistical power does not guarantee trustworthy results

Powered ≠ Trustworthy

Statistical power does not guarantee trustworthy results. Why powered experiments still inflate effects, and what to ask before trusting a significant win.

April 21, 2026Mårten Schultzberg, Staff Data Scientist

Header illustration for multiple testing corrections in A/B tests

Are Optimal Multiple Testing Corrections Optimal for You?

Bonferroni's conservatism reputation is mostly a denominator mistake. Here is why it holds up once you correct only the metrics that need it.

April 14, 2026Mårten Schultzberg, Staff Data Scientist

Collage of Spotify app screens showing artist pages, album views, video clips, and podcast episodes on a gradient background

When Proxy Metrics Break: How Optimizing for Proxies Can Backfire

Learn how wrong things can go when proxy metrics start to influence product development, and how to use them safely.

January 26, 2026Mårten Schultzberg, Staff Data Scientist

Blog

Accurate Sample Size for Always-Valid Inference

When A/B tests tell you what you want to hear

When AI writes the code, who decides what ships?

What Makes a Good Sample Size Calculator?

The Judgment Gap

The Real ROI of Experimentation

Spotify's Experimentation Bootcamp is now free: Introducing Confidence Bootcamp

Powered ≠ Trustworthy

Are Optimal Multiple Testing Corrections Optimal for You?

When Proxy Metrics Break: How Optimizing for Proxies Can Backfire

All posts

Why We Use Separate Tech Stacks for Personalization and Experimentation

Two Questions Every Experiment Should Answer

The Feature Flag Toolbox: Cloud, Edge, and Local

How Experimental Evidence Travels Through Your Organization: Why Better May Be Worse

Beyond Winning: Spotify's Experiments with Learning Framework

A/B Test Bandwidth: The Currency of Innovation

Experiments with Smaller Samples

Reduce Dilution and Improve Sensitivity with Trigger Analysis

Fixed-Power Designs: It's Not IF You Peek, It's WHAT You Peek at

Better Product Decisions with Guardrail Metrics

Blog

Accurate Sample Size for Always-Valid Inference

When A/B tests tell you what you want to hear

When AI writes the code, who decides what ships?

What Makes a Good Sample Size Calculator?

The Judgment Gap

The Real ROI of Experimentation

Spotify's Experimentation Bootcamp is now free: Introducing Confidence Bootcamp

Powered ≠ Trustworthy

Are Optimal Multiple Testing Corrections Optimal for You?

When Proxy Metrics Break: How Optimizing for Proxies Can Backfire

All posts

Why We Use Separate Tech Stacks for Personalization and Experimentation

Two Questions Every Experiment Should Answer

The Feature Flag Toolbox: Cloud, Edge, and Local

How Experimental Evidence Travels Through Your Organization: Why Better May Be Worse

Beyond Winning: Spotify's Experiments with Learning Framework

A/B Test Bandwidth: The Currency of Innovation

Experiments with Smaller Samples

Reduce Dilution and Improve Sensitivity with Trigger Analysis

Fixed-Power Designs: It's Not IF You Peek, It's WHAT You Peek at

Better Product Decisions with Guardrail Metrics