Evidence For Your Ideas
An A/B test is an experiment that lets you compare user reactions to different variants of your product. You get trustworthy evidence for or against new variants so that your can learn from your users and iterate quickly.This guide requires a feature flag and at least one metric. In what follows, the guide
uses the flag
tutorial-feature with a treatment variant, and the metric Page views per visitor for the
Visitor entity. Use other flags, variants, metrics, and entities whenever the guide refers to
these if yours have different names. If necessary, follow the
flags quickstart to create the tutorial-feature flag, and
the metrics quickstart to create the Page views per visitor metric.Step 1: Create an A/B test
Time to get started and create your A/B test. Open Confidence and select A/B Tests on the left sidebar. The overview page shows all draft, live, and ended A/B tests that you have permission to view. Click + Create in the upper right corner to create a new A/B test.Step 2: Name, Entity, and Owner
You first need to give your A/B test a name and assign an owner. Use a descriptive name that others understand. You need to decide the entity that you want to A/B test on. The A/B test uses the entity to randomly assign treatment and to aggregate metrics. For this exercise, use:- Name:
aa-ab-test-<your-name-and-date> - Entity: Visitor
- Owner: Select yourself
Step 3: Treatments
The treatments you choose for your A/B test decide what experiences you want to compare. The treatments you want to test must exist as variants on a feature flag. Feature flags define a configuration for an aspect of your app, website, or backend service. This step is where you select precisely what you want to vary in your A/B test.The first treatment group you add is the control group. The experiment compares all other treatments to the first treatment.
You can change which group is the control by dragging a group to the left-most position.
Use the current default experience as the control variant.
tutorial-feature flag. This flag changes nothing in your code, but only
serves the purpose of testing the A/B test functionality.
Click + Add control and select:
- Flag:
tutorial-feature - Control:
control - Treatment:
treatment
Step 4: Audience
The Audience section is where you define what your A/B test should target. Thetreatment randomization unit field determines what field in the evaluation
context of the flag that your rollout randomizes treatment assignment on.
Confidence pre-populates this field with the entity you selected in the creation
step. The context schema maps the entity ´Visitor´ to
the field ‘visitor_id’ (shown in parenthesis). This means that the rollout
randomizes on the value passed in the feature flag’s evaluation context field ‘visitor_id’.
You decide the target audience for your A/B test. For example, add
country is Sweden as an inclusion criterion if you want to target users in Sweden. The information
available in the evaluation context of the flag determines what you can target on. Read more on the
Audiences page.
In this case, target everyone by leaving the inclusion criteria empty.
The evaluation context is information that you pass in when making the resolve call to Confidence to
ask what variant to serve. This means what’s available in the evaluation context depends on what you
pass in. Confidence lists recently used attributes available in the evaluation context when you enter the
name of the attribute.
Set the Allocation
The allocation sets what proportion of your target audience is eligible for your A/B test. Adjust the slider to 5% to allocate 5% of the traffic to the A/B test. You can also set the allocation by entering 5% in the input field.Step 5: Metrics
In this step you select the metrics you want to track for your A/B test. Two types of metrics are available for A/B tests. Success metrics are metrics you intend to improve with your treatment. Guardrail metrics are metrics you don’t expect to improve, but you want to make sure they don’t deteriorate. Use both types to decide if the treatment variant is better than control for your product.Confidence uses the entity you selected in the creation step to decide which metrics you can select. Any metric based on the Visitor entity is available.
You need to use a fact table that includes the Visitor entity if you want to create a new metric for it.
Add Success Metric
In the success metric section, click Add metric and select:- Metric:
Page views per visitormetric. - Desired direction:
Increase. If you use a metric other thanPage views per visitor, set this to the direction you don’t want your metric to move in. - Minimum detectable effect (MDE): 5%.
Add Guardrail Metric
In the guardrail metric section, click Add metric and select:- Metric:
Page views per visitormetric. - Non-desired direction:
Decrease. If you use a metric other thanPage views per visitor, set this to the direction you don’t want your metric to move in. - Non-inferiority margin (NIM): 3%.
Step 6: Hypothesis
In this section, you should clearly describe your hypothesis so that everyone understands the purpose of the A/B test. See ideas for good hypotheses and learn more on the hypothesis page. Click Add a hypothesis on the right sidebar. Hypothesis: Changing thetutorial-feature from the variant control to treatment
for everyone should result in a change in their behavior, as measured by the success metric Page views per visitor.
The data supports the hypothesis if the success metric improves by 5% and the guardrail metric Page views per visitor
doesn’t deteriorate by more than 3%.
Step 7: Sample Size Calculation
In this step you run the sample size calculator to find out how many users your A/B test needs. This number represents how many users you need to have a reasonable chance of finding significant results if the treatment truly improves the user experience as much as you hope for. Read more about power analysis and the required sample size. Click Calculate in the Required sample size section on the right sidebar.The sample size calculation queries historical data in your data warehouse to estimate the required sample size. This might take a few minutes.
Step 8: Launch
Now it’s time to launch your A/B test! Click Launch in the top right corner.If there are other live rollouts or A/B tests that use the
tutorial-feature flag, you may not receive
5% of the traffic. Click Flags on the left sidebar and go to the tutorial-feature flag. In the
Rules section, you see what rules exist on the flag. If other rules have higher priority than
yours and use all traffic, you receive no traffic. If you want to receive traffic, you need
to adjust the priority of your rule and move it up in the list. Read more about
order of rules.Step 9: Monitoring and Results
When you launch the A/B test, Confidence calculates exposure for the A/B test at repeated short intervals to make sure that the A/B test is working as expected, and that you are seeing some traffic. Hover theLive status on the
right sidebar to see the current status of the checks run for the A/B test.
You can end the A/B test by clicking End in the upper right corner. Keep it running for longer
and check back in tomorrow to see your first results.
Remember to end your A/B test within a couple of days to not waste resources.

