A rollback reverts a feature to its previous state when problems are detected during a release. In a flag-based release system, rollback means setting the feature flag back to 0% exposure, instantly returning all users to the old experience. No code deploy required. No waiting for a CI/CD pipeline. The change takes effect in seconds.
This speed matters. The difference between a rollback that takes 30 seconds and one that takes 30 minutes is the difference between a minor hiccup and a user-facing incident. When Spotify's guardrail metrics detect a regression during a rollout, the team can roll back immediately because the feature flag is the only thing that needs to change. The problematic code stays deployed but inactive, ready to be investigated without time pressure.
How does a flag-based rollback work?
In a traditional deployment model, rolling back means redeploying the previous version of the application. That requires a build, a deploy pipeline, and often a review process. If the regression is in a database migration or a data format change, redeploying might not even fix the problem.
Flag-based rollback bypasses all of that. The feature flag controls which code path executes. Setting the flag to 0% means no user hits the new code path, even though the code is still present in the deployed binary. The rollback is a configuration change, not a deployment.
In Confidence, rolling back a flag is an operation in the platform UI or API. The flag moves to 0%, flag evaluation picks up the change within seconds (since evaluation is in-process at 10 to 50 microseconds, there's no network delay in the flag check itself), and users see the old experience on their next request.
This is one of the reasons feature flags are central to release management, not just experimentation. Every flag-controlled release comes with an instant undo button.
When should you roll back?
The decision depends on what the monitoring shows and how confident you are in the diagnosis.
Roll back immediately when guardrail metrics show a clear regression in reliability: crash rate spikes, error rate increases, or severe latency degradation. These failures get worse with exposure, not better. Every minute at the current allocation is accumulating harm.
Roll back and investigate when guardrail metrics show a moderate but consistent regression across multiple metrics. A 2% drop in one engagement metric might be noise. A 2% drop across three related metrics is a pattern worth taking seriously.
Hold and investigate when a single metric shows a borderline result that could be noise or could be a real problem. Holding at the current rollout stage (rather than advancing) gives you more data without increasing risk. If the signal strengthens, roll back. If it disappears, advance cautiously.
At Spotify, 42% of experiments are rolled back after guardrail metrics detect regressions. That number reflects a culture where rolling back is a normal engineering action, not a failure. The cost of a false positive rollback (rolling back a change that was actually fine) is much lower than the cost of a missed regression (shipping something that makes the product worse).
What happens after a rollback?
A rollback buys time. It removes the user-facing impact while the team investigates the root cause. The investigation can happen without urgency because no users are affected.
After identifying the cause, the team has three options: fix the issue and re-rollout, redesign the feature, or abandon the change. The last option is more common than most teams expect. If the rollback was triggered by a guardrail regression that the fix doesn't fully address, the right call is sometimes to accept that the change makes the product worse and move on.
The flag infrastructure makes re-rollout straightforward. Fix the code, deploy it (the flag is still at 0%, so the deploy itself is risk-free), then restart the rollout from the canary phase. Treat it as a new rollout, not a continuation of the old one.