Lesson 3: Sampling distribution of the difference-in-means estimator
Using probability theory, we know how the difference-in-means estimator varies across all possible samples and treatment assignments, without going through every combination.
Using probability theory, we can calculate, without going through all random samples and treatment assignments, how the difference-in-means estimator will vary across all possible samples and treatment assignments. In fact, we even know the precise distribution that the difference-in-means estimator will have across all possible samples and treatment assignments. For means and difference in means, the result that lets us do this is called the Central Limit Theorem. The Central Limit Theorem states that if the sample size is large enough, then the difference-in-means estimator will be normally distributed around the true average treatment effect across random samples and treatment assignments.
The difference-in-means estimator is approximately normally distributed around the true treatment effect, regardless of the distribution of the data. In other words, even if data is not even remotely close to normally distributed, the difference-in-means estimator will be normally distributed around the true treatment effect if the sample size is large enough.
You only observe a point estimate
Importantly, the observed difference in means in a given sample is not normally distributed since it's just a fixed value. It is the difference-in-means estimator across random samples and treatment assignments that is normally distributed. This means that if you would run the experiment many times, the difference in means you observe would be normally distributed.
Simulation
In this simulation, we draw a random sample, split it randomly into treatment and control, and calculate the difference in means. We do this many times to see how the difference in means varies across random samples and treatment assignments. Note that there is no treatment effect in this simulation. The variation in the difference in means is only due to random variation in the sample and treatment assignment. The observed distribution is called the sampling distribution of the difference-in-means estimator, as it is the distribution this estimator has across random samples and treatment assignments.
Random Sample
Treatment
Control
Difference in means
Histogram of difference in means
Samples: 0 / 500
The magic that probability theory and statistics bring us is that we know the what distribution will be a good approximation of the 500 simulated difference-in-means estimates under the null before we have run the simulation. It works, because of math!
The value of knowing the distribution of the difference-in-means can't be overstated. It lets us observe one sample and still draw conclusions (make inference) about the full population. More on that in the next lesson.
According to the Central Limit Theorem, what is the shape of the sampling distribution for the difference-in-means estimator when the sample size is large enough?
What exactly is normally distributed according to the lesson?
In the simulation described in the lesson, what causes the variation in the difference-in-means estimates when there is no treatment effect?
Notes for nerds
There are some technicalities in the Central Limit Theorem that we have glossed over. The Central Limit Theorem states that the difference-in-means estimator is normally distributed around the true treatment effect if the sample size is large enough. The exact conditions for when the Central Limit Theorem holds are a bit more nuanced, but for the purposes of this course, we can assume that the Central Limit Theorem holds when the sample size is large enough. In principle, as long as the underlying data doesn't have too fat tails, the Central Limit Theorem will hold.
There are ways of making inference that is not based on the Central Limit theorem. One example is the bootstrap method, which is a resampling method that can be used to estimate the distribution of an estimator without making assumptions about the distribution of the data. The bootstrap method is a powerful tool that can be used in many situations where the Central Limit Theorem doesn't hold. However, the bootstrap method is more computationally intensive but there are some tricks to make it faster. See for example our blog post on bootstrap for quantiles.