Lesson 10: Exploratory analysis

Splitting experiment results by user subgroups is powerful, but it requires a specific way of reading the results. This lesson focuses on one question: after you have split results by a subgroup, how do you interpret what you see?

What a dimension result shows

When you add a dimension, Confidence shows you the metric result for each segment separately. For example, if you split by platform, you see one result for iOS users and one for Android users. Each segment result has its own point estimate and confidence interval.

How to read a segment result

A segment result is a metric result like any other. The point estimate is the relative effect within that segment, and the confidence interval tells you how precisely that effect has been measured.

Read each segment result the same way you would read any CI:

  • Is the effect in the expected direction?
  • Does the CI cross zero? If it does, the effect for this segment is not statistically significant.
  • How wide is the CI? Segments typically have fewer users than the full experiment, so the CI will be wider. A wide CI in a segment means you have less precision, not that the effect is different.

What it means when segments look different

If one segment shows a positive effect and another shows no effect (or a negative effect), this is called a heterogeneous treatment variant effect: the treatment variant appears to work differently for different types of users.

Before drawing conclusions from a pattern like this, consider two things.

First, you are running one test per segment. With many segments, some will look significant by chance. A result that stands out across two or three segments is more credible than one that barely clears the threshold in a single segment.

Second, sample sizes per segment are smaller than for the full experiment. The difference you see between segments might be noise rather than a genuine interaction. Overlapping confidence intervals between segments are a strong sign that the apparent difference is not reliable.

The right response to an interesting segment pattern is not to ship only to that segment based on the exploratory result. It is to run a new experiment with that segment as the target population and the segmented metric as the pre-registered success metric.

Signs of a credible segment result

A segment result is more credible when:

  • It was pre-specified before the experiment ran (you predicted this segment would respond differently).
  • The effect size is large and the CI does not cross zero.
  • It replicates the direction of the overall result, just amplified (rather than going in the opposite direction).
  • It makes intuitive sense given what you know about the product and the treatment variant.

A segment finding that is surprising, post-hoc, and narrowly significant should be treated as a hypothesis, not a conclusion.