A phased rollout is a progressive rollout organized into discrete stages, each with a predefined percentage and explicit criteria that must be met before advancing to the next stage. A typical sequence looks like 1% to 10% to 50% to 100%, with guardrail metric checks at every gate.
Where a progressive rollout describes the general practice of gradually increasing exposure, a phased rollout adds structure: named stages, minimum hold durations, and go/no-go decisions. The stages turn a rollout into a protocol rather than a judgment call.
Why use discrete stages instead of continuous ramp?
Discrete stages create natural decision points. At each gate, the team reviews metrics and actively decides to proceed. A continuous ramp can drift upward without anyone making a conscious choice, which defeats the purpose of gradual release.
Stages also simplify communication. "We're at stage 2 of 4, holding at 10% for 48 hours" tells stakeholders exactly where the rollout stands. "We're at 17% and increasing steadily" is harder to reason about and harder to coordinate across teams.
At Spotify, phased rollouts are the standard release pattern. The stages vary by risk level. A low-risk copy change might go from 5% to 50% to 100% in three days. A major product surface redesign might go from 1% to 5% to 10% to 25% to 50% to 100% over two weeks, with longer holds at each stage. The structure scales with the risk.
What happens at each gate?
Each gate in a phased rollout answers one question: is it safe to increase exposure?
The answer comes from guardrail metrics. In Confidence, these metrics are computed between the exposed and unexposed groups at each stage, using the same statistical framework as A/B test analysis. The team checks for regressions in crash rates, error rates, latency, engagement, and any domain-specific metrics they've defined.
If all guardrails are clear, the team advances to the next stage. If a guardrail shows a regression, the team holds at the current stage to investigate, or rolls back entirely. The decision depends on the severity: a latency regression might warrant investigation at the current stage, while a crash rate spike means immediate rollback.
Confidence supports configuring required guardrail metrics per Surface, so the gate criteria are consistent across teams working on the same part of the product. This prevents the failure mode where one team monitors latency and another monitors only engagement, and neither catches the issue the other would have.
How do phased rollouts interact with A/B tests?
The common workflow: run an A/B test first to validate the feature, then use a phased rollout to release it.
The A/B test answers "does this change improve our success metric?" with a fixed traffic split and enough statistical power to detect the effect. The phased rollout answers "can we ship this safely?" with gradually increasing exposure and guardrail monitoring.
Some teams combine them. The first phase of the rollout doubles as a final validation check: hold at 10% for a few days and confirm the metrics match what the A/B test predicted. If the numbers align, proceed. If they diverge, investigate before exposing more users.