Confidence
  • Documentation
  • Blog
  • Bootcamp
  • Status
  • Confidence Bootcamp
    • My learning
    • Intro to experimentation
      • Introduction
      • Lesson 1: Why you should experiment
      • Lesson 2: Experiment hypothesis
      • Lesson 3: Success and guardrail metrics
      • Lesson 4: Success metrics
      • Lesson 5: Set up your experiment
      • Lesson 6: Calculation frequency
      • Lesson 7: Target audience
      • Lesson 8: Sample size
      • Lesson 9: Quality assurance
      • Lesson 10: Run your experiment
      • Lesson 11: Evaluate your experiment and make a decision
      • Lesson 12: A/B tests and rollouts
      • Course wrap up
    • Intro to metrics
      • Introduction
      • Lesson 1: What is a metric?
      • Lesson 2: Metric roles
      • Lesson 3: Time considerations
      • Lesson 4: Capturing behavior
      • Lesson 5: Strategic metrics
      • Lesson 6: Interpretability
      • Lesson 7: Feasibility and sensitivity
      • Lesson 8: Variance reduction
      • Lesson 9: Select metrics
      • Lesson 10: Segment-level analysis
      • Course wrap up
    • Scientific product development
      • Introduction
      • Lesson 1: Why you should experiment
      • Lesson 2: The scientific method
      • Lesson 3: Randomized controlled trials
      • Lesson 4: Experiment hypothesis
      • Lesson 5: Case study
        • Case study
        • Answers to case study
      • Lesson 6: Why do we need statistics?
      • Lesson 7: Success metrics
      • Lesson 8: Detectable effects and sample size
      • Lesson 9: Make a decision
      • Course wrap up
    • A primer on hypothesis testing
      • Introduction
      • Lesson 1: Introduction to hypothesis testing
      • Lesson 2: True vs estimated effects
      • Lesson 3: Sampling distribution of the difference-in-means estimator
      • Lesson 4: Z-tests and how to reject the null hypothesis
      • Lesson 5: False postive rate and alpha
      • Lesson 6: True positive rate, MDE, and power
      • Course wrap up
    • Intro to Feature Flags
      • Introduction
      • Lesson 1: What is a feature flag?
      • Lesson 2: Lifecycle of a feature flag
      • Lesson 3: Clients
      • Lesson 4: Evaluation context and targeting
    • Sample size calculation - I
      • Introduction
      • Lesson 1: What is the required sample size?
      • Lesson 2: Alpha and power
      • Lesson 3: Baseline mean and variance
      • Lesson 4: Sample size playground - I
    • Sample size calculation - II
      • Introduction
      • Lesson 1: Multi-metric decision making
      • Lesson 2: Number of success metrics
      • Lesson 3: Number of guardrail metrics
      • Lesson 4: Number of comparisons
      • Lesson 5: Sample size playground - II
    • Sample size calculation - III
      • Introduction
      • Lesson 1: Binary metrics
      • Lesson 2: Treatment group proportions
      • Lesson 3: Variance reduction
      • Lesson 4: Sequential testing and sample size
      • Lesson 5: Sample size playground - III
    • Advance your experimentation
      • Introduction
      • Lesson 1: Guardrail metrics with non-inferiority margins
      • Lesson 2: Choose evaluation frequency
      • Lesson 3: Metrics' roles in experiments
      • Lesson 4: Cumulative holdback evaluations
    • Experimentation culture
      • Introduction
      • Lesson 1: Onboarding into experimentation
      • Lesson 2: Empowering experimentation champions
      • Lesson 3: Sustaining the experimentation culture
    • Videos

Lesson 10: Exploratory analysis

Summary

In this lesson, you learn how to interpret segmented results in Confidence. You will know what a dimension result means, how to read it, and why segment findings require follow-up experiments rather than direct decisions.

Splitting experiment results by user subgroups is powerful, but it requires a specific way of reading the results. This lesson focuses on one question: after you have split results by a subgroup, how do you interpret what you see?

In Confidence

In Confidence, you split results by user subgroups by adding dimensions to an exploratory analysis. For the broader context on exploratory analysis (what explorations are, how to create them, and why the false positive risk matters), see the intro to metrics course.

What a dimension result shows

When you add a dimension, Confidence shows you the metric result for each segment separately. For example, if you split by platform, you see one result for iOS users and one for Android users. Each segment result has its own point estimate and confidence interval.

Note

Confidence always uses the dimension value from right before a user was exposed to the treatment variant. This means the segment a user is in cannot have been affected by the treatment variant itself. Static attributes like country or device type are inherently safe. Dynamic attributes like subscription tier or engagement level are also safe, because Confidence uses the pre-exposure snapshot.

How to read a segment result

A segment result is a metric result like any other. The point estimate is the relative effect within that segment, and the confidence interval tells you how precisely that effect has been measured.

Read each segment result the same way you would read any CI:

  • Is the effect in the expected direction?
  • Does the CI cross zero? If it does, the effect for this segment is not statistically significant.
  • How wide is the CI? Segments typically have fewer users than the full experiment, so the CI will be wider. A wide CI in a segment means you have less precision, not that the effect is different.

What it means when segments look different

If one segment shows a positive effect and another shows no effect (or a negative effect), this is called a heterogeneous treatment variant effect: the treatment variant appears to work differently for different types of users.

Before drawing conclusions from a pattern like this, consider two things.

First, you are running one test per segment. With many segments, some will look significant by chance. A result that stands out across two or three segments is more credible than one that barely clears the threshold in a single segment.

Second, sample sizes per segment are smaller than for the full experiment. The difference you see between segments might be noise rather than a genuine interaction. Overlapping confidence intervals between segments are a strong sign that the apparent difference is not reliable.

The right response to an interesting segment pattern is not to ship only to that segment based on the exploratory result. It is to run a new experiment with that segment as the target population and the segmented metric as the pre-registered success metric.

Example

An experiment shows no significant improvement overall. When split by platform, iOS shows +4.2% (95% CI: [−0.5%, +8.9%]) and Android shows −1.1% (95% CI: [−6.0%, +3.8%]). Both CIs cross zero. Both results are non-significant. The overlap between the two CIs is large.

The pattern is suggestive, but neither result is significant and the CIs heavily overlap. This is not strong evidence of a genuine platform difference. It is worth noting as a hypothesis to test in a follow-up experiment, but not a reason to ship selectively to iOS users.

Signs of a credible segment result

A segment result is more credible when:

  • It was pre-specified before the experiment ran (you predicted this segment would respond differently).
  • The effect size is large and the CI does not cross zero.
  • It replicates the direction of the overall result, just amplified (rather than going in the opposite direction).
  • It makes intuitive sense given what you know about the product and the treatment variant.

A segment finding that is surprising, post-hoc, and narrowly significant should be treated as a hypothesis, not a conclusion.

Reader exercise

You split experiment results by country and notice that one country shows a significant positive effect while all others do not. What is the most appropriate response?

Reader exercise

A segment result shows a wide confidence interval that crosses zero. What does this tell you?

Was this page helpful?

PreviousLesson 9: Sequential and non-sequential tests
NextLesson 11: The winner's curse

© Copyright 2026. All rights reserved.

Follow us on TwitterFollow us on GitHub

On this page

  1. What a dimension result shows

  2. How to read a segment result

  3. What it means when segments look different

  4. Signs of a credible segment result