One Sample T-Test and t-Distribution
The one sample t-test is used when you wanna know if a sample’s mean is different enough (statistically sound) than the population mean (which is known due to some theory or previous research).
Formula
The formula for the one sample t-test is:
where
- is mean of your sample
- is the population mean
- is the standard error of the mean
- is the standard deviation of your sample
- is the number of observations in your sample
Recall that the SE in Central Limit Theorem is defined as:
where:
- is the population standard deviation
- is still the number of observations in your sample
In the one sample t-test, we are using the sample standard deviation instead of the population standard deviation because we generally don’t know the population standard deviation.
It is acceptable to use the sample standard deviation in place of the population standard deviation but because the sample standard deviation tends to underestimate the population standard deviation, the calculation of the sample standard deviation is adjusted by dividing by instead of :
t-Distribution
If you look again in the main t-test formula:
You’ll see s in the denominator. s is the sample std, which varies from sample to sample (especially for small samples). This makes the distribution of t-scores wider, with heavier tails, than the standard normal distribution. So we use t-distribution to calculate the CDF and find the p-value.
The t-distribution is a family of curves, parameterized by the degrees of freedom (df). The degrees of freedom is basically your sample size but minus 1. The larger the sample size, the higher the DF, and the closer the t-distribution is to the standard normal distribution.
This makes sense because as your sample size increases, the sample standard deviation will have less variability (uncertainty), and t-scores will start looking more like a normal distribution.
Key Points
- The one sample t-test is used to determine if a sample mean is statistically different from a known population mean.
- You use the sample standard deviation to estimate the population standard deviation, which means two things:
- Since the sample standard deviation tends to underestimate the population standard deviation, the calculation of the sample standard deviation is adjusted by dividing by instead of .
- Since the sample std is used instead of pop std, the t-scores vary more (since sample std varies more, esp for small samples), so we use t-distribution to calculate the CDF/p-value.