Explainer·Statsig

What is Statsig?

Statsig is a feature flagging, experimentation, and product analytics platform founded in 2021 by Vijaye Raji and other ex-Facebook engineers. It bundles flags, A/B testing, funnels, retention analysis, and session replay into one product. In September 2025, Statsig was acquired by OpenAI; Vijaye Raji became OpenAI's CTO of Applications.

This page covers how Statsig works, what it does well, and where teams that need more rigorous experimentation methodology end up choosing Confidence, the experimentation platform Spotify has run for 15 years and still depends on.


How does Statsig work?

Statsig works by sitting between your product code and your decision data. Application code calls Statsig SDKs to check feature flags and log experiment exposures. Statsig records which users saw which variant and runs the statistical analysis that tells you whether a variant moved a metric. In its original mode, Statsig also stores and joins the underlying metric data; in Warehouse Native mode, the metric data and joins stay in your warehouse.

Statsig has two operating modes. In the original mode, assignment data, exposure logs, and product events flow through Statsig's own infrastructure, where analysis runs and results are returned. In Warehouse Native mode, data stays in your data warehouse (BigQuery, Snowflake, Databricks, or Redshift) and Statsig runs analysis there instead.

Feature flags evaluate locally inside the SDK after a config refresh, so flag checks do not block on a network call to Statsig at request time. Experimentation analysis is pre-aggregated so dashboards load without re-running large queries on every view.


What Statsig is good at

Statsig's product is broad. The main capabilities:

  • Feature flags. Targeting rules, gradual rollouts, and environment-scoped configuration.
  • A/B and multivariate testing. Variant assignment, exposure logging, and statistical analysis. Supports CUPED variance reduction and sequential testing.
  • Product analytics. Funnels, retention curves, user segmentation, and metric exploration.
  • Session replay. User-session video for debugging UX issues.
  • Warehouse Native mode. Analysis runs inside your warehouse; data does not flow through Statsig's storage in this mode.
  • SDKs across major languages. Server-side SDKs for Python, Go, Node, Java, Ruby, Rust; client SDKs for web and mobile.
  • Free tier. Usage-based pricing with a free tier large enough for many early-stage teams to run a real program.

A bundled flagging-plus-experimentation-plus-analytics platform removes integration overhead. Your team gets one SDK, one dashboard, and one set of credentials. For small teams, the free tier means running real A/B tests has no procurement cost. SDKs are well-documented, and the iteration loop from "create flag" to "see exposure data" is short.

For product-led startups that have not yet invested in a data warehouse or a dedicated analytics stack, this kind of bundled product compresses the time from "we want to run experiments" to "we are running experiments" significantly.


Confidence is the experimentation platform Spotify has run for 15 years and still depends on, available now to teams outside Spotify. 300+ Spotify teams, 10,000+ experiments per year, 750 million users. The roadmap is set by the team that built it. See how Confidence compares to Statsig →


Confidence vs Statsig, head to head

Confidence and Statsig are both warehouse-native experimentation platforms today. The differences are in scale evidence, statistical opinions, product scope, and where the roadmap is set.

Confidence is built and operated by the team that runs Spotify's experimentation platform. That same platform serves 300+ Spotify teams running 10,000+ experiments per year across 750M users. 42% of those experiments are rolled back after guardrail metrics flag a regression. Statsig is now an OpenAI subsidiary as of September 2025; its roadmap is set inside OpenAI.

Confidence's CUPED implementation uses the Negi–Wooldridge 2021 full regression estimator, a refinement that produces tighter confidence intervals than original CUPED. Group Sequential Tests, always-valid inference, sample ratio mismatch checks, and guardrail metrics ship as defaults rather than opt-in features. CUPED is a variance-reduction technique that uses pre-experiment data to tighten confidence intervals. Group Sequential Tests are a peeking-safe statistical method that lets you stop experiments early without inflating false positives.

Confidence is narrower in product scope than Statsig. Where Statsig includes funnels, retention, and session replay, Confidence focuses on flags and experiment analysis, routing teams to dedicated analytics tools for the rest.


Each platform fits a different buyer

Statsig fits product-led startups that want one bundled product covering flags, experiments, and product analytics, that find the free tier compelling, and that have not yet invested in dedicated data warehouse infrastructure. The breadth of features is the selling point.

Confidence fits teams that treat experimentation as a discipline. Teams that want warehouse-native by default rather than as a mode. Teams that want opinionated statistical defaults built on 15 years of operating experience. Teams that want a vendor whose roadmap is set by the same company that built it. If experimentation rigor and vendor parent are the priority, Confidence is the right choice.

Confidence is available today at confidence.spotify.com. The same managed service that gets a two-person team running in a day is the platform 300+ Spotify teams use to run 10,000+ experiments per year. You do not outgrow Confidence. Trial without procurement overhead.

If you are evaluating Statsig and want a side-by-side, the Confidence vs Statsig head-to-head goes deep on the methodology, architecture, and pricing differences. For teams already on Statsig who want to know what other options exist post-acquisition, see Top 7 alternatives to Statsig.