Confidence summarizes all the checks for an experiment on the results page in the spotlight section.
Monitor your experiment to make sure that you set it up correctly so
that the data collection and variant delivery to users work as intended.
Three important questions to ask when verifying an experiment are:
- Is exposure working as intended?
- Are the control or treatment groups biased?
- Are users receiving the intended experience?
Sample Ratio Mismatch Check
The set up of the experiment defined exposure.
The treatment must not impact exposure for results to be trustworthy.
To verify that exposure works as intended, the observed proportions in all treatment groups
should follow the expected variant allocations.
The sample ratio mismatch check tests if the observed proportions of traffic in each variant match
the expected proportions.
If the test indicates a problem, you have a clear signal that there is a systematic difference across
treatment groups in the probability that users log assignments.
A systematic traffic imbalance invalidates the results, as the groups are often no longer
comparable.
The analysis of the experiment relies on there being no systematic difference between the treatment groups.
Correct randomization makes it possible to attribute any movements in metrics to the treatment.
If there is a sign of a sample ratio mismatch, you should stop the experiment and investigate the issue.
Deterioration Checks
Confidence always checks the metrics you’ve selected for deterioration. Regardless
of the test evaluation frequency
employed, Confidence tests your metrics for movements in the wrong direction as
often as the metric data supports. If there is evidence that metrics are moving in
the wrong direction, Confidence alerts you and recommends aborting the experiment.
Stop the Experiment
You should stop the experiment when the experiment reaches its required sample
size. At this point, all metrics have the intended amount of data for powering
the metrics. You should stop the experiment regardless of if results show
significant improvements or not.
Stop your experiment when all metrics meet their required sample size and achieve power.
Current Powered Effect
If you have to stop your experiment before you reach the required sample size, make sure to present
the current powered effect together with results to reflect this increased
uncertainty. Failing to achieve the necessary sample size to power all metrics
means that the risk of overestimating effects is higher.