Confidence
  • Documentation
  • Blog
  • Bootcamp
  • Status
  • Confidence Bootcamp
    • My learning
    • Intro to experimentation
      • Introduction
      • Lesson 1: Why you should experiment
      • Lesson 2: Experiment hypothesis
      • Lesson 3: Success and guardrail metrics
      • Lesson 4: Success metrics
      • Lesson 5: Set up your experiment
      • Lesson 6: Calculation frequency
      • Lesson 7: Target audience
      • Lesson 8: Sample size
      • Lesson 9: Quality assurance
      • Lesson 10: Run your experiment
      • Lesson 11: Evaluate your experiment and make a decision
      • Lesson 12: A/B tests and rollouts
      • Course wrap up
    • Intro to metrics
      • Introduction
      • Lesson 1: What is a metric?
      • Lesson 2: Metric roles
      • Lesson 3: Time considerations
      • Lesson 4: Capturing behavior
      • Lesson 5: Strategic metrics
      • Lesson 6: Interpretability
      • Lesson 7: Feasibility and sensitivity
      • Lesson 8: Variance reduction
      • Lesson 9: Select metrics
      • Lesson 10: Segment-level analysis
      • Course wrap up
    • Scientific product development
      • Introduction
      • Lesson 1: Why you should experiment
      • Lesson 2: The scientific method
      • Lesson 3: Randomized controlled trials
      • Lesson 4: Experiment hypothesis
      • Lesson 5: Case study
        • Case study
        • Answers to case study
      • Lesson 6: Why do we need statistics?
      • Lesson 7: Success metrics
      • Lesson 8: Detectable effects and sample size
      • Lesson 9: Make a decision
      • Course wrap up
    • A primer on hypothesis testing
      • Introduction
      • Lesson 1: Introduction to hypothesis testing
      • Lesson 2: True vs estimated effects
      • Lesson 3: Sampling distribution of the difference-in-means estimator
      • Lesson 4: Z-tests and how to reject the null hypothesis
      • Lesson 5: False postive rate and alpha
      • Lesson 6: True positive rate, MDE, and power
      • Course wrap up
    • Intro to Feature Flags
      • Introduction
      • Lesson 1: What is a feature flag?
      • Lesson 2: Lifecycle of a feature flag
      • Lesson 3: Clients
      • Lesson 4: Evaluation context and targeting
    • Sample size calculation - I
      • Introduction
      • Lesson 1: What is the required sample size?
      • Lesson 2: Alpha and power
      • Lesson 3: Baseline mean and variance
      • Lesson 4: Sample size playground - I
    • Sample size calculation - II
      • Introduction
      • Lesson 1: Multi-metric decision making
      • Lesson 2: Number of success metrics
      • Lesson 3: Number of guardrail metrics
      • Lesson 4: Number of comparisons
      • Lesson 5: Sample size playground - II
    • Sample size calculation - III
      • Introduction
      • Lesson 1: Binary metrics
      • Lesson 2: Treatment group proportions
      • Lesson 3: Variance reduction
      • Lesson 4: Sequential testing and sample size
      • Lesson 5: Sample size playground - III
    • Advance your experimentation
      • Introduction
      • Lesson 1: Guardrail metrics with non-inferiority margins
      • Lesson 2: Choose evaluation frequency
      • Lesson 3: Metrics' roles in experiments
      • Lesson 4: Cumulative holdback evaluations
    • Experimentation culture
      • Introduction
      • Lesson 1: Onboarding into experimentation
      • Lesson 2: Empowering experimentation champions
      • Lesson 3: Sustaining the experimentation culture
    • Videos

Lesson 9: Quality Assurance

Summary

To test whether your code works as intended, use override rules to assign specific users to your new feature. You can also run experiments on employees only, and run A/A tests to test your setup end-to-end before launching your main experiment.

Overrides

You can assign a specific user to a particular treatment by overriding the randomization. This means that you can add yourself or other members of the experimenting team to a specific variant at any time to try it out. You can verify that your implementations appear to be working as they should before releasing the experiment to actual users.

Overriding users into specific treatments doesn't affect the results, as the exposure data doesn't include the overrides.

In Confidence

In Confidence, you create override rules to assign specific users to a particular treatment.

Employee only

Depending on the nature of your product, a powerful next step in the QA process is to run your experiment on employees only. Make sure to include an attribute in the evaluation context that identifies the incoming request as belonging to an employee, and then use that in your inclusion criteria. This way, you can test your change and its different values on users that are a bit more forgiving. It can give you the chance to detect errors that you might not notice during the early stages of QA. The drawback is of course that the sample size is typically so small that it's difficult to find any meaningful effects, but you might hear from your colleagues if something isn't working as it should.

Note

For employee experiments to be possible you must include employee status in the evaluation context of your feature flag.

You can also give your new feature to employees only by directly creating a rule on your flag that has employee status as an inclusion criteria.

A/A tests

Sometimes you may want to run an A/A test to test your overall setup before launching the actual experiment. An A/A test is just like an A/B test, except that the experiences given to the control and treatment groups are the same. Either the two variants you use are the same, or you resolve the flag in your code but don't use the received variant values. A/A tests are particularly helpful if you want to test your whole setup end-to-end on real users and get real exposure data.

In Confidence

A/A tests are useful when you have just integrated your service with Confidence. The A/B test quickstart describes such a test.

Reader exercise

Whose responsibility it is to ensure that an experiment/rollout doesn't break the end-user experiment?

Was this page helpful?

PreviousLesson 8: Sample size
NextLesson 10: Run your experiment

© Copyright 2026. All rights reserved.

Follow us on TwitterFollow us on GitHub

On this page

  1. Overrides

  2. Employee only

  3. A/A tests