Sequential Tests

Sequential tests make it possible to analyze results during an experiment without jeopardizing the statistical integrity. If it’s important to analyze results during the experiment, you should use a sequential test. Sequential tests typically have lower power compared to non-sequential tests. The downside is that if you end an experiment as soon as a metric is significant in a sequential test, the resulting effect estimates are often biased. Confidence always runs deterioration checks for your metrics using sequential tests, even if you choose to view the results after the experiment ends.

Sequential tests allow you to analyze the experiment repeatedly, but typically require more samples compared to a non-sequential test. Use non-sequential tests for the primary metrics of an experiment and sequential tests for detecting degradations early to maximize power. Use a sequential test for analyzing the main results only if it’s important to check the progress during the experiment.

Confidence offers two types of sequential tests: group sequential tests and always-valid sequential tests. Group sequential tests are the classical statistical approach to sequential analysis that adjusts the standard z-test to account for multiple analyses. Always-valid tests are a new development for sequential tests that require fewer assumptions. Compared to group sequential tests, always-valid tests don’t require you to give an expected sample size before the start of the experiment. Because they implicitly correct for an infinite number of analyses, they typically have lower power compared to group sequential tests.

Group sequential tests tend to have higher power compared to always-valid tests. They require you to specify an expected sample size up front. Your estimate of the expected sample size doesn’t need to be exact. If you can give a reasonable estimate, then you should use a group sequential test.

To read more about sequential testing and the trade-off between various methods, read the following blog post.

Group Sequential Tests

If you give an expected sample size when setting up the experiment, Confidence uses group sequential tests to calculate valid statistical results while an experiment is running. The group sequential test optimally exploits the dependence between the tests at different points in time. It allocates the overall false positive rate that the experiment can spend across the multiple tests performed over time. How much of the false positive rate that each analysis spends depends on the amount of information that’s available at that time point relative to the expected amount of information at the end of the experiment.

Always-Valid Inference

If you don’t give an expected sample size, Confidence can’t use the group sequential test and instead resorts to an always-valid approach. These tests guarantee that the false positive rate doesn’t exceed the intended level, but usually have lower power than the group sequential tests. This means that it is harder to find effects.

Analyze Results Sequentially

Sequential tests are always enabled for rollouts. To use a sequential testing strategy for an A/B test or an analysis workflow:

Go to Confidence and select A/B Tests or Analysis Workflows on the left sidebar.
Select the experiment that you want to analyze sequentially.
On the right sidebar, click Results > Results Settings.
Select Continuously.

Use the group sequential test by providing an expected sample size:

Optional. On the right sidebar, click Results > Configure Statistics. Enter the Expected sample size.

If the expected sample size is present, deterioration checks use the group sequential test. The same applies even if you choose to view the results after the experiment ends.

To configure sequential testing via the API, see Configure Sequential Testing.

References

C. Jennison and B. W. Turnbull (2000) “Group Sequential Methods with Applications to Clinical Trials,” Chapman & Hall/CRC.
M. Schultzberg and S. Ankargren (2023) “Choosing a Sequential Testing Framework—Comparisons and Discussions,” Spotify Engineering Blog, https://engineering.atspotify.com/2023/03/choosing-sequential-testing-framework-comparisons-and-discussions/.
G. Y. Zou, A. Donner, and N. Klar (2005) “Group sequential methods for cluster randomization trials with binary outcomes.” Clinical Trials.
I. Waudby-Smith, D. Arbour, R. Sinha, E. H. Kennedy, and A. Ramdas (2023) “Time-uniform central limit theory and asymptotic confidence sequences,” https://arxiv.org/pdf/2103.06476v8.pdf.

Statistical Settings

Configure sequential testing

Statistical Tests

Understand test types

Monitoring

Monitor live experiments

Analyze Results

Understand experiment results

Get Started

Quickstarts

How-To Guides

About

Warehouse Setup

Reference

Group Sequential Tests

Always-Valid Inference

Analyze Results Sequentially

References

Statistical Settings

Statistical Tests

Monitoring

Analyze Results

Get Started

Quickstarts

How-To Guides

About

Warehouse Setup

Reference

​Group Sequential Tests

​Always-Valid Inference

​Analyze Results Sequentially

​References

​Related Resources

Statistical Settings

Statistical Tests

Monitoring

Analyze Results

Group Sequential Tests

Always-Valid Inference

Analyze Results Sequentially

References

Related Resources