Confidence
  • Pricing
  • Success stories
  • Contact us
  • Login
Start free trial

Blog

When AI writes the code, who decides what ships?

AI-accelerated code production increases the need for experimentation. The validation bottleneck grows with build speed. The fastest learners will win.

Read article
June 8, 2026/Johan Rydberg, General Manager

What Makes a Good Sample Size Calculator?

Most sample size calculators assume a fixed-sample test with a single metric. If your experiment uses sequential testing, corrects for multiple metrics, or applies variance reduction, the number is inaccurate and you are wasting resources.

May 27, 2026/Mårten Schultzberg, Staff Data Scientist

The Judgment Gap

AI made building cheap. It also made bad decisions cheaper to ship. The distance between execution speed and validation speed is the judgment gap.

May 20, 2026/Johan Rydberg, General Manager

The Real ROI of Experimentation

Most teams justify experimentation by counting winners. The real value shows up in three places: shipped wins, prevented harm, and how fast the organization learns from results.

May 19, 2026/Johan Rydberg, General Manager

Spotify's Experimentation Bootcamp is now free: Introducing Confidence Bootcamp

The A/B testing curriculum Spotify built over ten years to train thousands of experimenters is now free and open to everyone.

April 27, 2026/Mårten Schultzberg, Staff Data Scientist

Powered ≠ Trustworthy

Powering an experiment does not make its results trustworthy. Trustworthiness depends on whether the effect you powered for matches the true effect, and that is rarely something you can know in advance.

April 21, 2026/Mårten Schultzberg, Staff Data Scientist

Are Optimal Multiple Testing Corrections Optimal for You?

Bonferroni's conservatism reputation is mostly a denominator mistake. Here is why it holds up once you correct only the metrics that need it.

April 14, 2026/Mårten Schultzberg, Staff Data Scientist

When Proxy Metrics Break: How Optimizing for Proxies Can Backfire

Learn how wrong things can go when proxy metrics start to influence product development, and how to use them safely.

January 26, 2026/Mårten Schultzberg, Staff Data Scientist

Why We Use Separate Tech Stacks for Personalization and Experimentation

Why we maintain distinct technology stacks for personalization and experimentation.

January 7, 2026/Yu Zhao, Staff Machine Learning Engineer

Two Questions Every Experiment Should Answer

Learn how to never run an experiment without learning something.

December 15, 2025/Mårten Schultzberg, Staff Data Scientist

All posts

The Feature Flag Toolbox: Cloud, Edge, and Local

View post

How Experimental Evidence Travels Through Your Organization: Why Better May Be Worse

View post

Beyond Winning: Spotify's Experiments with Learning Framework

View post

A/B Test Bandwidth: The Currency of Innovation

View post

Experiments with Smaller Samples

View post

Reduce Dilution and Improve Sensitivity with Trigger Analysis

View post

Fixed-Power Designs: It's Not IF You Peek, It's WHAT You Peek at

View post

Better Product Decisions with Guardrail Metrics

View post

Collaboration Fuels Efficient Experimentation

View post

Risk-Aware Product Decisions in A/B Tests with Multiple Metrics

View post
Spotify

Learn more

  • Read our blog
  • See comparisons
  • Glossary
  • RFP guides
  • Listen to us
  • Read our docs
  • Status page

Need help

  • Contact us

Legal

  • Terms of Service
  • Data Protection Agreement
  • Privacy Policy
  • Cookies

© 2026 Spotify

The Confidence name and logo are registered trademarks of Spotify.