Skip to main content

Evidence For Your Ideas

An A/B test is an experiment that lets you compare user reactions to different variants of your product. You get trustworthy evidence for or against new variants so that your can learn from your users and iterate quickly.
This guide requires a feature flag and at least one metric. In what follows, the guide uses the flag tutorial-feature with a treatment variant, and the metric Page views per visitor for the Visitor entity. Use other flags, variants, metrics, and entities whenever the guide refers to these if yours have different names. If necessary, follow the flags quickstart to create the tutorial-feature flag, and the metrics quickstart to create the Page views per visitor metric.
Some sections link to associated videos and documentation. Make use of these resources as they contain important information that the guide doesn’t cover. This guide helps you set up a test that doesn’t change anything in your code. Such a test lets you learn how to set up and run an A/B test in Confidence, and goes by the name of A/A test. It can either use the same variant for treatment and control, or use different variants that don’t change the user experience. They help you practice setting up and running A/B tests before you start testing real changes, providing an opportunity to validate and practice your full flow.

Step 1: Create an A/B test

Time to get started and create your A/B test. Open Confidence and select A/B Tests on the left sidebar. The overview page shows all draft, live, and ended A/B tests that you have permission to view. Click + Create in the upper right corner to create a new A/B test.

Step 2: Name, Entity, and Owner

You first need to give your A/B test a name and assign an owner. Use a descriptive name that others understand. You need to decide the entity that you want to A/B test on. The A/B test uses the entity to randomly assign treatment and to aggregate metrics. For this exercise, use:
  • Name: aa-ab-test-<your-name-and-date>
  • Entity: Visitor
  • Owner: Select yourself
Click Create. You’re now on the A/B test configuration page.

Step 3: Treatments

The treatments you choose for your A/B test decide what experiences you want to compare. The treatments you want to test must exist as variants on a feature flag. Feature flags define a configuration for an aspect of your app, website, or backend service. This step is where you select precisely what you want to vary in your A/B test.
The first treatment group you add is the control group. The experiment compares all other treatments to the first treatment. You can change which group is the control by dragging a group to the left-most position. Use the current default experience as the control variant.
In this guide, you don’t want to test a real feature or change for your users. Instead, you want to use the tutorial-feature flag. This flag changes nothing in your code, but only serves the purpose of testing the A/B test functionality. Click + Add control and select:
  • Flag: tutorial-feature
  • Control: control
  • Treatment: treatment

Step 4: Audience

The Audience section is where you define what your A/B test should target. The treatment randomization unit field determines what field in the evaluation context of the flag that your rollout randomizes treatment assignment on. Confidence pre-populates this field with the entity you selected in the creation step. The context schema maps the entity ´Visitor´ to the field ‘visitor_id’ (shown in parenthesis). This means that the rollout randomizes on the value passed in the feature flag’s evaluation context field ‘visitor_id’. You decide the target audience for your A/B test. For example, add country is Sweden as an inclusion criterion if you want to target users in Sweden. The information available in the evaluation context of the flag determines what you can target on. Read more on the Audiences page. In this case, target everyone by leaving the inclusion criteria empty.
The evaluation context is information that you pass in when making the resolve call to Confidence to ask what variant to serve. This means what’s available in the evaluation context depends on what you pass in. Confidence lists recently used attributes available in the evaluation context when you enter the name of the attribute.
This video gives a quick overview of how to increase the flexibility of who you can include and exclude in your experiments by providing more information when you resolve feature flags in 2 minutes and 10 seconds.

Set the Allocation

The allocation sets what proportion of your target audience is eligible for your A/B test. Adjust the slider to 5% to allocate 5% of the traffic to the A/B test. You can also set the allocation by entering 5% in the input field.

Step 5: Metrics

In this step you select the metrics you want to track for your A/B test. Two types of metrics are available for A/B tests. Success metrics are metrics you intend to improve with your treatment. Guardrail metrics are metrics you don’t expect to improve, but you want to make sure they don’t deteriorate. Use both types to decide if the treatment variant is better than control for your product.
Confidence uses the entity you selected in the creation step to decide which metrics you can select. Any metric based on the Visitor entity is available. You need to use a fact table that includes the Visitor entity if you want to create a new metric for it.

Add Success Metric

In the success metric section, click Add metric and select:
  • Metric: Page views per visitor metric.
  • Desired direction: Increase. If you use a metric other than Page views per visitor, set this to the direction you don’t want your metric to move in.
  • Minimum detectable effect (MDE): 5%.

Add Guardrail Metric

In the guardrail metric section, click Add metric and select:
  • Metric: Page views per visitor metric.
  • Non-desired direction: Decrease. If you use a metric other than Page views per visitor, set this to the direction you don’t want your metric to move in.
  • Non-inferiority margin (NIM): 3%.

Step 6: Hypothesis

In this section, you should clearly describe your hypothesis so that everyone understands the purpose of the A/B test. See ideas for good hypotheses and learn more on the hypothesis page. Click Add a hypothesis on the right sidebar. Hypothesis: Changing the tutorial-feature from the variant control to treatment for everyone should result in a change in their behavior, as measured by the success metric Page views per visitor. The data supports the hypothesis if the success metric improves by 5% and the guardrail metric Page views per visitor doesn’t deteriorate by more than 3%.

Step 7: Sample Size Calculation

In this step you run the sample size calculator to find out how many users your A/B test needs. This number represents how many users you need to have a reasonable chance of finding significant results if the treatment truly improves the user experience as much as you hope for. Read more about power analysis and the required sample size. Click Calculate in the Required sample size section on the right sidebar.
The sample size calculation queries historical data in your data warehouse to estimate the required sample size. This might take a few minutes.
If the required sample size is smaller than the number of users you can receive with the current allocation, your A/B test can reach power. To learn more how to work with design and the statistical settings of an A/B test see the Alpha and Power page in the documentation. You have successfully configured your A/B test, great job!

Step 8: Launch

Now it’s time to launch your A/B test!
This step launches your A/B test. Make sure that you have selected the tutorial-feature flag to avoid changing a real experience.
Click Launch in the top right corner.
If there are other live rollouts or A/B tests that use the tutorial-feature flag, you may not receive 5% of the traffic. Click Flags on the left sidebar and go to the tutorial-feature flag. In the Rules section, you see what rules exist on the flag. If other rules have higher priority than yours and use all traffic, you receive no traffic. If you want to receive traffic, you need to adjust the priority of your rule and move it up in the list. Read more about order of rules.
Congratulations, you have launched your first A/B test!

Step 9: Monitoring and Results

When you launch the A/B test, Confidence calculates exposure for the A/B test at repeated short intervals to make sure that the A/B test is working as expected, and that you are seeing some traffic. Hover the Live status on the right sidebar to see the current status of the checks run for the A/B test. You can end the A/B test by clicking End in the upper right corner. Keep it running for longer and check back in tomorrow to see your first results.
Remember to end your A/B test within a couple of days to not waste resources.