Feature Flags

What is a Canary Release?

A canary release exposes a change to a tiny fraction of traffic first, typically 1% or less, to detect severe problems before wider rollout.

A canary release exposes a change to a tiny fraction of traffic first, typically 1% or less, to detect severe problems before wider rollout. The name comes from the coal mining practice of bringing a canary into the mine: if the canary dies, the miners know the air is toxic before it kills them. In software, the canary group encounters bugs, crashes, and performance regressions before they can affect the full user base.

A canary release is the first stage of a phased rollout. It answers the narrowest possible question: does this change cause obvious harm? You're not trying to measure a subtle metric improvement at 1%. You're looking for crashes, error spikes, and latency regressions that would be visible even in a small sample.

How does a canary release differ from a full rollout?

A canary release is a specific phase within a broader rollout, not an alternative to one. The distinction matters because the goals are different at each stage.

During the canary phase, you're watching for catastrophic failures. The sample is too small for subtle metric analysis, but it's large enough to detect a 5x increase in crash rate or a 200ms latency spike. The hold duration is short, usually a few hours to a day, because you're screening for acute problems.

After the canary phase passes, the rollout continues through larger stages (10%, 50%, 100%) where the sample sizes support more nuanced metric comparisons. The monitoring becomes more sophisticated: instead of "did anything break badly?", you're asking "are guardrail metrics within acceptable bounds?"

In Confidence, a canary release uses the same flag infrastructure as any other rollout. You set the flag to 1%, let it hold, check guardrail metrics, and then decide whether to advance. There's no separate canary system to maintain.

What should you look for during a canary?

Focus on signals that are both severe and fast-moving.

Crash rates. A code path that causes crashes will show up quickly even at 1% traffic. If your canary group of 10,000 users has a crash rate 3x higher than the control, you know within hours.

Error rates. Server-side errors (5xx responses), client-side exceptions, and failed API calls. These surface broken functionality that might not cause a full crash but degrades the experience.

Latency. P95 and P99 latency are particularly useful during canary phases because tail latency problems that are invisible in averages show up clearly in high percentiles.

Sample ratio mismatch. If the number of users in the canary group doesn't match the expected allocation, something is wrong with the assignment mechanism itself. Confidence automatically checks for sample ratio mismatches, catching this class of problem before it corrupts the rollout.

Engagement metrics, retention, and business outcomes aren't useful during a canary phase. You don't have the sample size to detect meaningful differences, and trying to interpret noisy data at 1% leads to bad decisions.

When should you skip the canary phase?

Rarely. Even low-risk changes benefit from a brief canary hold because the cost is low (a few hours at 1%) and the protection is real. The cases where teams skip it are usually changes that affect no user-facing behavior: backend configuration updates, internal tooling changes, or flag cleanup.

For anything that touches the user experience, a canary phase is the cheapest insurance available. At Spotify, canary phases are standard practice across teams, and the regressions caught at 1% regularly prevent incidents that would have been much more expensive at full rollout.