The platform provides tests for differences between means of the treatment groups and the control group. The success metrics and guardrail metrics tests are slightly different in their interpretations.Documentation Index
Fetch the complete documentation index at: https://confidence.spotify.com/llms.txt
Use this file to discover all available pages before exploring further.
Superiority Tests
Confidence uses superiority tests for success metrics and for deterioration tests.
- Any change
- Increase
- Decrease
- Significant result: The data shows evidence that the treatment caused a change in the metric.
- Insignificant result: The data shows no evidence that the treatment caused a change in the metric.
Non-Inferiority Tests
Confidence uses non-inferiority tests for guardrail metrics.
- Increase
- Decrease
- Significant result: The data shows evidence that the metric hasn’t decreased by more than NIM in the treatment group.
- Insignificant result: The data shows no evidence that the metric hasn’t decreased by more than NIM in the treatment group.
Inferiority Tests
Confidence uses inferiority tests for unintended negative effects in success and guardrail metrics. The inferiority test is testing for a move in the opposite direction than the intended one.
- Increase
- Decrease
- Significant result: The data shows evidence that the treatment caused a decrease in the metric.
- Insignificant result: The data shows no evidence that the treatment caused a decrease in the metric.
Relative Values
Confidence performs tests on the absolute values, but lets you give NIMs on a relative scale. The mean of the baseline group, typically the control group, transforms the relative values into absolute values.Tests for Success Metrics
Success metrics always use a superiority test. The test is against the null hypothesis of zero mean difference between the groups.Tests for Guardrail Metrics
You can test guardrail metrics in two different ways:- Use an inferiority test. This test evaluates whether there is evidence that the guardrail metric does worse in the treatment group compared to the control group.
- Use a non-inferioriy test. This test instead evaluates whether there is evidence that the guardrail metric does better than a pre-defined threshold in the treatment group compared to the control group.
Tests for Deterioration
Confidence tests all success and guardrail metrics for deterioration. For success metrics, this means testing for inferiority and superiority separately. For guardrail metrics, this means testing for inferiority and non-inferiority if the guardrail metric uses a non-inferiority test.Related Resources
Statistical Settings
Configure alpha and power
Sequential Tests
Continuous analysis methods
Variance Reduction
Improve metric precision

