A target audience is the subset of users eligible for an experiment, defined by targeting rules that filter on user attributes like country, platform, subscription tier, account age, or behavioral characteristics. Not every experiment should include every user. Targeting the right population improves statistical power, makes results more interpretable, and ensures you're measuring the effect on the users who'll actually encounter the change.
Targeting decisions shape the entire experiment. At Spotify, where the user base spans 750 million users across 186 markets, a change designed for premium subscribers on iOS doesn't need to include free-tier Android users. Including them would dilute the treatment effect with users who never encounter the change, inflate the required sample size, and produce an average treatment effect that describes no one's actual experience.
How does targeting affect experiment results?
Targeting determines the population your results generalize to. If you target an experiment at new users in the US, the treatment effect you measure applies to new users in the US. It doesn't necessarily apply to tenured users, users in other markets, or users on different platforms.
This sounds obvious, but it's a common source of mistakes. A team tests a change on power users, sees a positive result, and ships it to everyone. The effect on the broader population may be smaller, zero, or even negative. The experiment measured the conditional average treatment effect (CATE) for the targeted subgroup, not the population-wide ATE.
Confidence lets teams define targeting rules at the feature flag level, controlling exactly which users are assigned to the experiment. The same flag that controls experiment assignment also controls the eventual rollout, so the targeting criteria carry through from test to release.
Why is targeting important for statistical power?
Including users who never encounter a change adds noise without adding signal. This is the dilution problem. If you change the podcast player but include users who only listen to music, those music-only users contribute zero signal to the treatment effect and a full share of metric variance. The effect gets diluted.
Proper targeting removes this dilution at the assignment level. Trigger analysis addresses it at the analysis level (by restricting to users who actually experienced the change). Both approaches improve sensitivity, but targeting is cleaner because it solves the problem before data collection rather than after.
A concrete example from Spotify's published experimentation guidance: when the Spotify Home team runs experiments on the mobile home screen, they target users who actually visit the home screen during the experiment window. That sounds like a small detail, but it can make the difference between an experiment that reaches adequate power in one week versus three.
What makes a good targeting rule?
Good targeting rules satisfy three criteria.
Relevance. The targeted population should be the users who'll actually experience the change. If the change is to the desktop app, target desktop users. If it's a feature for new users, target accounts created in the last 30 days.
Stability. Targeting attributes should be stable during the experiment. If users can move in and out of the target population mid-experiment (for example, targeting "users who listened to a podcast this week"), you introduce selection bias. Users who enter the target group after assignment may differ systematically from those who were eligible at the start. Attributes like country, platform, and subscription tier are stable. Behavioral attributes need careful handling.
Pre-treatment definition. The targeting criteria must be determined before the experiment starts, not based on behavior during the experiment. Targeting "users who clicked on the new feature" is circular: it conditions on the outcome of the treatment itself.
How does targeting relate to mutually exclusive experiments?
When multiple experiments target overlapping user populations, coordination matters. Two experiments targeting the same audience on the same product surface should be mutually exclusive (no user in both simultaneously) to prevent interaction effects. Two experiments targeting different audiences can safely overlap, even on the same surface.
Confidence's Surface concept handles this coordination. Within a Surface, the platform enforces mutual exclusion among experiments that share targeting criteria, while allowing experiments with distinct target audiences to run independently.