You’re interested in calculating the average weight of engineers and seeing if it’s different from the average weight of everyone. You know the average weight of everyone is 150 lbs.
So, you go out, find 100 engineers (a “sample”), and record their weight. You find that their average weight is 185 lbs. Can you conclude that engineers are usually plumpier than the general population? How do you know that when you picked your 100 engineers to weigh, that you didn’t just happened to, by chance, pick juicier ones?
Central Limit Theorem
If you re-sample, meaning that you take another sample of 100 engineers, you will get a slightly different mean weight based on this new sample. If you continue to re-sample, and you keep track of all the means for all the samples, you will notice that they (the means) form a normal distribution.
Furthermore, the mean of this normal distribution will be the same as the mean weight of the underlying population. The standard deviation (measure of spread) of this distribution will depend on the sample size and population mean. The bigger the sample size, the smaller the standard deviation, which makes sense. To be precise the standard deviation is where is the standard deviation of weights in the entire population and is the size of your samples.
The above facts are known as the central limit theorem:
If you keep resampling, and you plot all the means of your samples, they will form a normal distribution, with the mean being the same as the population mean, and the standard deviation being
Are Our Results Due to Chance?
We know that there is some chance, that we just picked heavier engineers when we sampled, but we can actually use concepts of the central limit theorem, to find out that chance! If that chance is really, really small, then we can conclude that we didn’t by chance pick heavier engineers, that engineers are generally a little tubbier than the general population!
We know that regardless of the distribution of weights in the general population, the plot of means of various samples will always be normal, courtesy of the central limit theorem. We also know that this plot will have the same mean as the population mean. Furthermore, we know the standard deviation of this plot: . We don’t know the standard deviation of the entire population, but if you have a good (~100) number of samples, you can use your sample standard deviation as the population standard deviation.
So, in other words, we know the exact shape of what i’ll call the “central limit theorem plot” for our samples size (100). If you think about it, this “central limit theorem plot” tells you that if you sample 100 people from the general population, and you get a certain mean weight for your sample, what is the chance that you would get this mean simply due to sampling?
If this chance is really small, than the mean weight that we got for our sample is actually “significantly” different than the mean weight of the general population, and this difference is “unlikely” to be due to sampling chance. Notice, that you pick the threshold for “significantly”/”unlikely”.
Let’s say that our “central limit theorom plot” looks such that the probability of getting a mean of 185 lbs with 100 samples is 0.01, than we can say “pretty” confidently that we didn’t just happen to pick heavier people, that engineers are really heavier than the general population.
You can use this plot to answer a lot of sample-chance related questions. For example, what are the chances that we’d pick a sample that has a weight of 100-150 lbs? Well we can just integrate the area of this curve from 100 to 150!
- central limit theorem can tell us how likley it is that our sample’s mean is just due to sampling chance
- if this chance is really small, then we can conclude that something about our sample is different from the general population, because there is such a small chance that we’d get this kind of mean