Statistical Tests

The platform provides tests for differences between means of the treatment groups and the control group. The success metrics and guardrail metrics tests are slightly different in their interpretations.

Superiority Tests

Confidence uses superiority tests for success metrics and for deterioration tests.

A success metric test can be significant or non-significant. Significant means that it’s unlikely to find the observed difference of means between the groups if there were no effect. All success metric tests are against the null hypothesis of zero. Three types of tests are available for success metrics.

Any change
Increase
Decrease

Significant result: The data shows evidence that the treatment caused a change in the metric.
Insignificant result: The data shows no evidence that the treatment caused a change in the metric.

The statistical hypotheses used in the test are:

$H_0: \delta = 0$
$H_1: \delta \neq 0$

where

\delta

is the treatment effect.

Non-Inferiority Tests

Confidence uses non-inferiority tests for guardrail metrics.

For non-inferiority tests, the test is against the null hypothesis of NIM (non-inferiority margin). You must select a direction for a non-inferiority test.

Increase
Decrease

Significant result: The data shows evidence that the metric hasn’t decreased by more than NIM in the treatment group.
Insignificant result: The data shows no evidence that the metric hasn’t decreased by more than NIM in the treatment group.

The statistical hypotheses used in the test are:

$H_0: \delta < -NIM$
$H_1: \delta > -NIM$

where

\delta

is the treatment effect.

Inferiority Tests

Confidence uses inferiority tests for unintended negative effects in success and guardrail metrics. The inferiority test is testing for a move in the opposite direction than the intended one.

For inferiority tests, the test is against the null hypothesis of zero. You must select a direction for an inferiority test.

Increase
Decrease

Significant result: The data shows evidence that the treatment caused a decrease in the metric.
Insignificant result: The data shows no evidence that the treatment caused a decrease in the metric.

The statistical hypotheses used in the test are:

$H_0: \delta = 0$
$H_1: \delta < 0$

where

\delta

is the treatment effect.

Relative Values

Confidence performs tests on the absolute values, but lets you give NIMs on a relative scale. The mean of the baseline group, typically the control group, transforms the relative values into absolute values.

Tests for Success Metrics

Success metrics always use a superiority test. The test is against the null hypothesis of zero mean difference between the groups.

Tests for Guardrail Metrics

You can test guardrail metrics in two different ways:

Use an inferiority test. This test evaluates whether there is evidence that the guardrail metric does worse in the treatment group compared to the control group.
Use a non-inferioriy test. This test instead evaluates whether there is evidence that the guardrail metric does better than a pre-defined threshold in the treatment group compared to the control group.

Tests for Deterioration

Confidence tests all success and guardrail metrics for deterioration. For success metrics, this means testing for inferiority and superiority separately. For guardrail metrics, this means testing for inferiority and non-inferiority if the guardrail metric uses a non-inferiority test.

Statistical Settings

Configure alpha and power

Sequential Tests

Continuous analysis methods

Variance Reduction

Improve metric precision

Get Started

Quickstarts

How-To Guides

About

Warehouse Setup

Reference

Superiority Tests

Non-Inferiority Tests

Inferiority Tests

Relative Values

Tests for Success Metrics

Tests for Guardrail Metrics

Tests for Deterioration

Statistical Settings

Sequential Tests

Variance Reduction

Get Started

Quickstarts

How-To Guides

About

Warehouse Setup

Reference

​Superiority Tests

​Non-Inferiority Tests

​Inferiority Tests

​Relative Values

​Tests for Success Metrics

​Tests for Guardrail Metrics

​Tests for Deterioration

​Related Resources

Statistical Settings

Sequential Tests

Variance Reduction

Superiority Tests

Non-Inferiority Tests

Inferiority Tests

Relative Values

Tests for Success Metrics

Tests for Guardrail Metrics

Tests for Deterioration

Related Resources