Confidence
  • Pricing
  • Success stories
  • Contact us
  • Login
Start free trial
All terms
Feature Flags

What is a Rollout?

A rollout is the process of releasing a feature to users in controlled stages using feature flags.

A rollout is the process of releasing a feature to users in controlled stages using feature flags. Instead of flipping a switch and exposing every user at once, you start with a small percentage, monitor metrics, and increase exposure gradually. The goal is safe release, not measurement. You're answering "can we ship this without breaking anything?" rather than "did this change make the product better?"

At Spotify, every production change that touches user experience goes through a rollout. The platform monitors guardrail metrics at each stage. If a metric regresses, the team rolls back before the damage reaches most users. Across 10,000+ experiments per year, 42% are rolled back after guardrails detect regressions. That catch rate depends on the rollout pattern giving the platform time to detect problems before they're everywhere.

How does a rollout work?

A rollout in Confidence is technically a flag-controlled experiment with adjustable reach. The flag starts at a low percentage, say 1% or 5%. Users in that percentage see the new feature. Everyone else sees the existing experience.

At each stage, the platform compares guardrail metrics between the exposed and unexposed groups. If crash rates, latency, error rates, or other guardrails stay within bounds, the team increases the percentage. The stages might be 1% to 10% to 50% to 100%, or they might be more granular depending on the risk profile of the change.

The assignment is deterministic: a hash of the user ID ensures the same user consistently sees the same experience, without storing per-user state. When you increase from 10% to 50%, the original 10% stays in the treatment group. New users are added, but no one gets shuffled between groups mid-rollout.

When should you use a rollout vs. an A/B test?

Rollouts and A/B tests are complementary tools that answer different questions.

An A/B test validates whether a change improves a success metric. Traffic splits at a fixed ratio (usually 50/50), the experiment runs until it reaches statistical power, and the result tells you whether users are better off. A/B tests are designed for learning.

A rollout releases a validated change safely. Traffic starts small and grows. The platform watches for regressions. Rollouts are designed for shipping.

The typical workflow at mature experimentation organizations: A/B test first to validate the idea, then roll out to release it. Some low-risk changes (copy updates, configuration adjustments) skip straight to rollout. High-risk changes sometimes get both an A/B test and a slow rollout on top.

Confidence treats both as first-class concepts. An A/B test and a rollout use the same flag infrastructure, the same metric monitoring, and the same guardrail checks. The difference is in intent and traffic allocation, not in the underlying machinery.

What makes a rollout go wrong?

Rolling out too fast. If you jump from 1% to 100% because the first stage looked clean, you skip the stages where subtle regressions would have become visible. A metric that looks stable at 1% might show a clear regression at 50% because the sample size is now large enough to detect it. Patience at each stage is the cheapest insurance.

Not monitoring the right metrics. A rollout that only tracks crash rates will miss a degraded user experience that doesn't cause crashes. Guardrail metrics should cover reliability (crashes, errors, latency), user experience (engagement, retention proxies), and business outcomes (conversion, revenue per user). Confidence lets teams define required guardrail metrics per Surface, so the monitoring is consistent across rollouts.

Skipping the rollback plan. Every rollout should have a clear answer to "what happens if this goes wrong?" In Confidence, rollback means setting the flag back to 0%. It takes seconds and requires no deploy. Teams that plan for rollback before they start the rollout make faster decisions when things go sideways.

Related terms

Feature Flags
Feature Flag

A feature flag is a runtime switch that controls whether a feature is active for a given user, without deploying new code.

Feature Flags
Progressive Rollout

A progressive rollout is the practice of gradually increasing the percentage of users exposed to a feature over time, rather than releasing it to everyone at once.

Feature Flags
Phased Rollout

A phased rollout is a progressive rollout organized into discrete stages, each with a predefined percentage and explicit criteria that must be met before advancing to the next stage.

Feature Flags
Canary Release

A canary release exposes a change to a tiny fraction of traffic first, typically 1% or less, to detect severe problems before wider rollout.

Feature Flags
Rollback

A rollback reverts a feature to its previous state when problems are detected during a release.

Feature Flags
Holdback

A holdback is a subset of users intentionally kept on the old experience after a feature has shipped to 100% of everyone else.

Metrics
Guardrail Metric

A guardrail metric is a metric monitored during an experiment to ensure the change doesn't cause unintended harm, even when the success metric improves.

Spotify

Learn more

  • Read our blog
  • See comparisons
  • Glossary
  • RFP guides
  • Listen to us
  • Read our docs
  • Status page

Need help

  • Contact us

Legal

  • Terms of Service
  • Data Protection Agreement
  • Privacy Policy
  • Cookies

© 2026 Spotify

The Confidence name and logo are registered trademarks of Spotify.