Eppo is a warehouse-native experimentation platform founded in 2020 by Che Sharma. It runs experiment analysis inside your data warehouse and is built around metric definitions managed in code. Its product scope is focused: experimentation analysis with a feature-flagging layer, rather than a bundled analytics suite with session replay and product analytics.
This page covers how Eppo works, what its product scope is, and where it sits relative to other warehouse-native experimentation platforms, including Confidence, the experimentation platform Spotify has run for 15 years and still depends on.
How does Eppo work?
Eppo runs as a layer between your application and your warehouse. Application code calls Eppo's assignment SDKs to check feature flags and log experiment exposures. Assignment data flows through Eppo's infrastructure for low-latency flag evaluation, while metric calculation runs inside your data warehouse using metric definitions you maintain in code, typically YAML files version-controlled alongside your dbt models or other data- infrastructure code.
When you run an experiment in Eppo, the platform writes assignment records to your warehouse and runs SQL against your warehouse to compute treatment effects. Metric definitions live in code, so changes to a metric show up as a pull request rather than a UI edit. The result is an experimentation workflow that uses the same review process as the rest of your data infrastructure.
Eppo's documentation describes support for combining observational analysis with randomized experiments, allowing teams to generate candidate hypotheses on observational data and confirm them in randomized tests using the same metric definitions.
What Eppo is good at
Eppo's product is focused. The main capabilities:
- Warehouse-native architecture. Analysis runs inside BigQuery, Snowflake, Databricks, or Redshift. Metric data and joins stay in your warehouse.
- Metric definitions in code. YAML files version-controlled alongside the rest of your data infrastructure. Pull requests for metric changes; review history; no UI drift.
- CUPED variance reduction. CUPED is a variance-reduction technique that uses pre-experiment data to tighten confidence intervals.
- Group Sequential Tests (a peeking-safe statistical method that lets you stop experiments early without inflating false positives) and other sequential testing approaches.
- Sample ratio mismatch checks (which flag when traffic splits do not match the configured allocation, usually a sign of a bucketing bug) and guardrail metrics as defaults.
- Feature flagging with assignment SDKs across major server and client languages.
- Slack-first notification surfaces for experiment lifecycle events.
A focused warehouse-native experimentation platform with metric- as-code is a legitimate operational improvement that you cannot get from a UI-defined metrics product. If your team already version-controls dbt models, BI configs, and data pipeline code, having metric definitions in the same review process is a real workflow win.
For organizations that have already invested in a data warehouse and have a culture of treating experimentation as a discipline, Eppo is a short path from "we want rigorous experimentation" to "we have metric definitions and statistical defaults we can defend."
Confidence is what Spotify uses to decide what its product becomes. 10,000+ experiments per year, run by 300+ teams, on a platform that has been operated continuously for 15 years. The defaults are what survived 15 years of being used in anger. It is now available to teams outside Spotify.
Where Confidence and Eppo diverge
Confidence and Eppo are both warehouse-native experimentation platforms with rigorous statistical defaults. The differences are narrow but consequential.
That same platform serves 300+ Spotify teams running 10,000+ experiments per year across 750M users. 42% of those experiments are rolled back after guardrail metrics flag a regression. Eppo is a five-year-old commercial product founded in 2020, with a customer base that pushes the platform on methodology.
Confidence's CUPED implementation uses the Negi–Wooldridge 2021 full regression estimator, which produces tighter confidence intervals than original CUPED. Group Sequential Tests, sample ratio mismatch checks, and guardrail metrics ship as defaults rather than configurable choices. Eppo's CUPED implementation is documented as supported; the specific estimator is not named in public documentation, so practitioners evaluating on this dimension should ask Eppo directly.
Eppo wins on metric workflow. Eppo's metric-as-code pattern in YAML is more mature than Confidence's equivalent today. Confidence wins on SDK posture: OpenFeature donated to the CNCF means flag-evaluation code is portable across any OpenFeature provider, which Eppo's Eppo-specific SDKs do not offer.
Each platform fits a different buyer
Eppo fits data-science-led organizations that already version-control their data infrastructure, that want metric definitions reviewed in code rather than a UI, and that prefer methodological optionality (Bayesian alongside frequentist). The metric-as-code workflow is the selling point.
Confidence fits teams that want opinionated defaults built on 15 years of Spotify-scale operation, that want OpenFeature portability at the SDK layer, that want a multi-team coordination primitive (Surfaces) for experiments at scale, and that prefer a vendor whose roadmap is set by the team that built the platform. Both products are good. The right answer depends on which axis matters most.
Confidence is available at confidence.spotify.com, with a free trial that does not require a procurement conversation. The managed service that gets a two-person team running in a day is the same platform 300+ Spotify teams use to run 10,000+ experiments per year; the architecture does not change as you grow into it.
If you are evaluating Eppo and want a side-by-side, the Confidence vs Eppo head-to-head covers methodology, metric workflow, and architecture in detail. For teams already on Eppo who want to know what other options exist, see Top 7 alternatives to Eppo.