LaunchDarkly is the dominant enterprise feature flag platform. Founded in 2014 in Oakland by Edith Harbaugh and John Kodumal, the company (legal entity Catamorphic Co.) ships feature flag management as the core product, with experimentation, AI Configs, Guarded Releases, and an Observability surface (acquired with Highlight.io in April 2025) layered on top. As of early 2026, LaunchDarkly serves 5,500+ customers and processes 45 trillion flag evaluations per day.
The sections below cover how the platform works, what it is good at, and where it sits relative to Confidence, the experimentation platform Spotify has run for 15 years.
How does LaunchDarkly work?
LaunchDarkly works as a managed service that sits between your application code and your release decisions. Application code calls LaunchDarkly SDKs to evaluate feature flags, with the SDK maintaining a cached configuration that updates over a streaming or polling connection so flag evaluation is local and does not block on a network call at request time. Flag rules and rollout percentages are managed in LaunchDarkly's UI, where teams configure targeting by user attributes, segments, scheduled releases, and approval workflows.
For experimentation, LaunchDarkly's stats engine ingests exposure events from the SDKs and metric data from configured sources, then runs analysis using CUPED variance reduction (CUPED uses pre-experiment data to tighten confidence intervals), frequentist sequential testing, sample ratio mismatch detection (with a >99% posterior odds threshold), and guardrail metrics. Experiment results are surfaced in the same UI as flag management, so the flag rollout and the experiment that motivates it live in one workflow.
The Observability surface (Highlight.io, acquired April 2025) adds error monitoring, session replay, and trace observability, sharing the same SDK and configuration layer as flags. AI Configs (GA May 2025) extends the same flagging primitive to AI prompts, models, and parameters, with token-level metrics; this is how LaunchDarkly positions for the AI-coding workflow alongside three MCP servers (hosted, local, observability) that integrate with Cursor, Claude Code, VS Code Copilot, and Windsurf.
What LaunchDarkly is good at
LaunchDarkly's strength is enterprise feature flag governance at scale. The capabilities that matter most for the typical buyer:
- Approval workflows and configurable change-management policies. Required reviewers, scheduled releases, change windows, and policy enforcement across many engineering teams.
- Audit trail attributable per change. Every flag change is recorded with the actor, time, and rule diff. This is the surface enterprise compliance buyers evaluate first.
- Role-based access control, SSO/SCIM, and enterprise IAM integration. Standard enterprise IAM patterns shipped as primary features rather than tier-locked add-ons.
- FedRAMP Moderate authorization (LaunchDarkly Federal, CMS-sponsored since January 2023). US federal customers and regulated-industry buyers who need this surface have it.
- Guarded Releases (Guardian tier). Automated guardrails on rollouts, with rollback triggered by metric regressions during a progressive release.
- Observability bundle (Highlight.io, since April 2025). Error monitoring, session replay, and trace observability inside the same product as flags.
- AI Configs (GA May 2025). A/B testing prompts, models, and parameters with token-level metrics for AI-product teams.
- Three MCP servers and broad AI-coding integration. Hosted, local, and observability MCP servers across Cursor, Claude Code, VS Code Copilot, and Windsurf.
- Self-serve pricing tiers. Developer (free, unlimited seats and flags), Foundation (10 per 1k client-side MAU). Enterprise and Guardian are sales-gated.
Enterprise platform teams whose primary problem is governing flag changes across many engineering teams under regulatory or audit pressure are LaunchDarkly's natural buyer. Experimentation, observability, and AI-driven release coordination round out the platform for those teams.
Confidence is the platform Spotify uses to decide what its product becomes. The defaults reflect 15 years of running experiments at scale, including the failure modes that only show up at scale. It is now available to teams outside Spotify.
Where Confidence and LaunchDarkly diverge
LaunchDarkly is feature-flag-management-first; Confidence is experimentation-first. Both products ship CUPED, frequentist sequential testing, sample ratio mismatch detection, and guardrail metrics today. The wedge is which capability each vendor is built around, and where engineering investment goes quarter to quarter.
Confidence is built and operated by the team that runs Spotify's experimentation platform. The same platform serves 300+ Spotify teams running 10,000+ experiments per year across 750M users in 186 markets. 42% of those experiments are rolled back after guardrail metrics flag a regression. LaunchDarkly has 12 years as a commercial vendor, 5,500+ customers, and 45 trillion flag evaluations per day.
Confidence's CUPED implementation uses the Negi–Wooldridge full regression estimator. Group Sequential Tests with always- valid inference allow safe peeking. Trigger analysis (analyzing only the users actually exposed to the variant rather than all assigned users) ships as a default, alongside Surfaces, a multi-team coordination primitive that prevents teams from stepping on each other's experiments at scale with shared required metrics enforced across a product area.
LaunchDarkly's strengths sit on the governance and platform side. FedRAMP Moderate authorization. Highlight.io observability. AI Configs for prompt and model A/B testing. The MCP servers and AI-coding integrations. Guarded Releases. For teams whose primary problem is governing flag changes safely at scale, LaunchDarkly's surface is built for that problem and Confidence's is not.
LaunchDarkly is built for enterprise flag governance. Confidence is not.
LaunchDarkly fits enterprise platform teams that need flag management at scale, FedRAMP Moderate compliance, audit trails, approval workflows, and an observability and AI-coding surface inside one vendor. The breadth of the platform is the selling point.
Confidence fits teams that have decided experimentation is a discipline worth investing in as a single concern, that want opinionated defaults built on 15 years of Spotify-scale operation, and that want OpenFeature portability at the SDK layer. The depth of methodology is the selling point.
The two products do not really compete on the same buyer. They compete on the buyer's choice about whether to centralize on a flag-management platform with experimentation capabilities, or an experimentation platform with feature flags as a first-class primitive. That choice shapes which problems get easier over the next five years and which stay hard.
A free self-serve trial of Confidence is available at confidence.spotify.com without going through procurement. The Confidence vs LaunchDarkly head-to-head covers methodology, governance, and pricing in detail. For teams already on LaunchDarkly who want to know what other options exist, see Top 7 alternatives to LaunchDarkly.