Lesson 10: Exploratory analysis
In this lesson, you learn how to interpret segmented results in Confidence. You will know what a dimension result means, how to read it, and why segment findings require follow-up experiments rather than direct decisions.
Splitting experiment results by user subgroups is powerful, but it requires a specific way of reading the results. This lesson focuses on one question: after you have split results by a subgroup, how do you interpret what you see?
In Confidence, you split results by user subgroups by adding dimensions to an exploratory analysis. For the broader context on exploratory analysis (what explorations are, how to create them, and why the false positive risk matters), see the intro to metrics course.
What a dimension result shows
When you add a dimension, Confidence shows you the metric result for each segment separately. For example, if you split by platform, you see one result for iOS users and one for Android users. Each segment result has its own point estimate and confidence interval.
Confidence always uses the dimension value from right before a user was exposed to the treatment variant. This means the segment a user is in cannot have been affected by the treatment variant itself. Static attributes like country or device type are inherently safe. Dynamic attributes like subscription tier or engagement level are also safe, because Confidence uses the pre-exposure snapshot.
How to read a segment result
A segment result is a metric result like any other. The point estimate is the relative effect within that segment, and the confidence interval tells you how precisely that effect has been measured.
Read each segment result the same way you would read any CI:
- Is the effect in the expected direction?
- Does the CI cross zero? If it does, the effect for this segment is not statistically significant.
- How wide is the CI? Segments typically have fewer users than the full experiment, so the CI will be wider. A wide CI in a segment means you have less precision, not that the effect is different.
What it means when segments look different
If one segment shows a positive effect and another shows no effect (or a negative effect), this is called a heterogeneous treatment variant effect: the treatment variant appears to work differently for different types of users.
Before drawing conclusions from a pattern like this, consider two things.
First, you are running one test per segment. With many segments, some will look significant by chance. A result that stands out across two or three segments is more credible than one that barely clears the threshold in a single segment.
Second, sample sizes per segment are smaller than for the full experiment. The difference you see between segments might be noise rather than a genuine interaction. Overlapping confidence intervals between segments are a strong sign that the apparent difference is not reliable.
The right response to an interesting segment pattern is not to ship only to that segment based on the exploratory result. It is to run a new experiment with that segment as the target population and the segmented metric as the pre-registered success metric.
An experiment shows no significant improvement overall. When split by platform, iOS shows +4.2% (95% CI: [−0.5%, +8.9%]) and Android shows −1.1% (95% CI: [−6.0%, +3.8%]). Both CIs cross zero. Both results are non-significant. The overlap between the two CIs is large.
The pattern is suggestive, but neither result is significant and the CIs heavily overlap. This is not strong evidence of a genuine platform difference. It is worth noting as a hypothesis to test in a follow-up experiment, but not a reason to ship selectively to iOS users.
Signs of a credible segment result
A segment result is more credible when:
- It was pre-specified before the experiment ran (you predicted this segment would respond differently).
- The effect size is large and the CI does not cross zero.
- It replicates the direction of the overall result, just amplified (rather than going in the opposite direction).
- It makes intuitive sense given what you know about the product and the treatment variant.
A segment finding that is surprising, post-hoc, and narrowly significant should be treated as a hypothesis, not a conclusion.