The ability to calculate the average or mean value of a group of numbers is important in every aspect of life. If you are a professor assigning letter grades to exam scores and traditionally give a grade of B- to a middle-of-the-pack score, then you clearly need to know what the middle of the pack looks like numerically. You also need a way to identify scores as outliers so that you can determine when someone deserves an A or A+ (outside of perfect scores, obviously) as well as what merits a failing grade.
For this and related reasons, complete data about averages includes information about how closely clustered around the average score the scores are in general. This information is conveyed using standard deviation and, relatedly, the variance of a statistical sample.
Measures of Variability
You've almost certainly heard or seen the term "average" used in reference to a set of numbers or data points, and you probably have an idea of what it translates to in everyday language. For example, if you read that the average height of an American woman is about 5' 4", you immediately conclude that "average" means "typical," and that about half of the women in the United States are taller than this while about half are shorter.
Mathematically, average and mean are exactly the same thing: You add of the values in a set and divide by the number of items in the set. For example, if a group of 25 scores on a 10-question test range from 3 to 10 and add up to 196, the average (mean) score is 196/25, or 7.84.
The median is the midpoint value in a set, the number that half of the values lie above and half of the values lie below. It is usually close to the average (mean) but is not the same thing.
If you eyeball a set of 25 scores like the ones above and see almost nothing but values of 7, 8 and 9, it makes intuitive sense that the average should be around 8. But what if you see almost nothing but scores of 6 and 10? Or five scores of 0 and 20 scores of 9 or 10? All of these can produce the same average.
Variance is a measure of how widely the points in a data set are spread about the mean. To calculate variance by hand, you take the arithmetic difference between each of the data points and the average, square them, add the sum of the squares and divide the result by one less than the number of data points in the sample. An example of this is provided later. You can also use programs such as Excel or websites like Rapid Tables (see Resources for additional sites).
The variance is denoted by the σ2, a Greek "sigma" with an exponent of 2.
The standard deviation of a sample is simply the square root of the variance. The reason squares are used when computing variance is that if you simply add together the individual differences between the average and each individual data point, the sum is always zero because some of these differences are positive and some are negative, and they cancel each other out. Squaring each term eliminates this pitfall.
Sample Variance and Standard Deviation Problem
Assume you are given the 10 data points:
4, 7, 10, 5, 7, 6, 9, 8, 5, 9
Find the average, the variance and the standard deviation.
First, add the 10 values together and divide by 10 to get the average (mean):
70/10 = 7.0
To get the variance, square the difference between each data point and the average, add these together and divide the result by (10 - 1), or 9:
- 7 - 4 = 3; 32 = 9
- 7 - 7 = 0; 02 = 0
- 7 - 10 = -3; (-3)2 = 9 . . .
9 + 0 + 9 + . . . + 4 = 36
σ2= 36/9 = 4.0
The standard deviation σ is just the square root of 4.0, or 2.0.
- Digital Vision./Photodisc/Getty Images