Confidence centers the adjustment for multiple comparisons around the idea of a decision rule. In an experiment, it’s the decision to release or not release a new feature that the experiment design should control the risks for. The adjustments vary among metrics, because different types of metrics contribute differently to the decision rule. The adjustments ensure that the observed alpha for the binary decision to ship or not is at most equal to the original alpha. Similarly for power, the observed power level is at least equal to the original power level across repeated experiments.Documentation Index
Fetch the complete documentation index at: https://confidence.spotify.com/llms.txt
Use this file to discover all available pages before exploring further.
The Overall Shipping Decision
An important feature of the statistical analysis in Confidence is that the errors that can happen, false positive and false negatives, matter on the experiment level, and not on the individual metric level. In other words, the rates at which these errors happen is over repeated experiments. From a product perspective, false positives and false negatives exist for the decision to ship a feature or not. A false positive is when you ship a feature that truly doesn’t have an effect, and a false negative is when you don’t ship a feature that truly had an effect. Confidence uses a composite decision rule to produce an overall recommendation for a shipping decision. The results must pass the following for a recommendation to ship:- at least one success metric has evidence of improvement
- all guardrail metrics show evidence of being within acceptable margins
- Alpha is adjusted using a Bonferroni correction, where the original alpha is divided by the number of success metrics.
- The power level is adjusted using
1 - (1 - power)/(number of guardrails).
References
- A. Dmitrienko, A.C. Tamhane,, and F. Bretz (Eds.) (2009) “Multiple Testing Problems in Pharmaceutical Statistics” (First ed.), Chapman and Hall/CRC.
Related Resources
Analyze Results
Understand decision rules
Statistical Settings
Configure alpha and power
Metrics in Experiments
Configure success and guardrails
Statistical Tests
Understand test types

