What is a Feature Flag?

A feature flag is a runtime switch that controls whether a feature is active for a given user, without deploying new code. Instead of shipping a change and hoping it works, you wrap the change in a flag. The flag decides who sees it, when, and under what conditions. Deploying code and releasing a feature become two separate acts.

This separation matters because it changes the risk profile of every release. A team can merge code to production on Monday and release the feature to 1% of users on Wednesday. If something goes wrong, they flip the flag off in seconds. No rollback deploy, no hotfix, no incident. The blast radius stays small because the flag kept it small.

At Spotify, feature flags are the foundation of both experimentation and release management. Every A/B test runs through a flag. Every rollout is a flag with an adjustable reach percentage. Across 300+ teams running 10,000+ experiments per year, flags are the mechanism that connects "we want to test this idea" to "users are seeing it."

How do feature flags work?

At the simplest level, a feature flag is a conditional: if the flag is on for this user, show the new experience; otherwise, show the old one. But the implementation details determine whether flags are a useful tool or a liability.

In Confidence, flags are structured configurations with typed schemas. A single flag can control multiple properties at once: a color value, a copy string, and a numeric threshold, all bundled into one flag with defined types for each property. This structure prevents the common failure mode where teams create dozens of boolean flags that interact in unpredictable ways.

Flag evaluation in Confidence happens in-process at 10 to 50 microseconds, with no network call at evaluation time. The SDK loads the flag configuration on initialization and evaluates locally. If Confidence goes down, your flags keep evaluating with the last known configuration. This architecture means flags don't add meaningful latency to your application, and a platform outage doesn't become your outage.

What types of feature flags exist?

Feature flags serve different purposes depending on their intended lifespan and function.

Release flags control the rollout of new features. They're temporary by design: once the feature is fully rolled out and stable, you remove the flag. A release flag might start at 0%, move to 1% for a canary check, then ramp to 10%, 50%, and 100% over a week. Most flags in a healthy codebase are release flags, and most should be cleaned up within weeks of reaching full rollout.

Experiment flags split traffic between control and treatment groups for an A/B test. They look similar to release flags, but their purpose is measurement, not release. The allocation stays fixed (usually 50/50) until the experiment reaches statistical power.

Kill switches are permanent flags designed for emergency deactivation. They stay in the codebase indefinitely because their job is to provide a safety valve you can pull at any time, even years after the feature shipped.

Permission flags grant access to specific users or segments: internal employees, beta testers, users in a specific market. They tend to be longer-lived than release flags but shorter-lived than kill switches.

What goes wrong with feature flags?

The most common problem is flag debt. Teams create flags for releases, finish the rollout, and never clean up the flag. Over time, the codebase accumulates dead conditionals that no one is sure about. Removing a flag that might still matter feels risky, so it stays. Multiply that across a hundred features and the codebase becomes harder to reason about.

The fix is organizational, not technical: treat flag cleanup as part of the release process, not a separate backlog item. When the rollout reaches 100% and metrics look stable, removing the flag is the last step of the release, not a future chore.

A second problem is flag interactions. When two flags modify the same part of the product, the combined experience might be something no one designed or tested. Confidence addresses this through its Surface concept, which groups experiments on the same product area and coordinates them so teams don't step on each other's changes.

How are feature flags different from feature toggles?

They aren't. "Feature toggle" and "feature flag" refer to the same concept. "Feature flag" has become the more common term in modern engineering practice, and it's the term Confidence uses. Some older documentation and Martin Fowler's original taxonomy use "feature toggle," but the mechanism is identical.

What is a Feature Flag?

How do feature flags work?

What types of feature flags exist?

What goes wrong with feature flags?

How are feature flags different from feature toggles?

Related terms