You have a bunch of values (e.g. bunch of ages or weights). Finding the average (aka mean aka expected) is straightforward. Add them up, then divide by the number of values. The average tells you where the center of the distribution is for your values. But you’re also interested in the dispersion of your values.

Variance tells you about the dispersion. Intuitively, variance tells you on average how far a value is from the average. Strictly, it says on average how far away the squared distance is from the average. The reason for the square is to give extra weight to further distances.

We usually represent variance with $\sigma^2$ (notice it’s sigma squared, not just sigma).

Standard deviation is another way to represent the dispersion. Standard deviation is simply the square root of variance!

Notice, that since standard deviation is simply the square root of variance, further distances are still weighed more! But standard deviation works in the same units as the data, so it is more convenient?

Why don’t we just find on average how far our values are from the average, and use that as a representation of the dispersion? Well, because since further distances (e.g. outliers) wouldn’t be punished, this doesn’t work well in inferential statistics, so variance/standard deviation is better.

• variance and standard deviation both tell you about the dispersion of your values.
• variance tells you on average how far your values are from the average (it uses distance squared, in order to punish outliers)
• standard deviation is simply the square root of variance