Simplify comparisons of sets of number, especially large sets of number, by calculating the center values using mean, mode and median. Use the ranges and standard deviations of the sets to examine the variability of data.
The mean identifies the average value of the set of numbers. For example, consider the data set containing the values 20, 24, 25, 36, 25, 22, 23.
Adding Data Set
To find the mean, use the formula: Mean equals the sum of the numbers in the data set divided by the number of values in the data set. In mathematical terms: Mean=(sum of all terms)÷(how many terms or values in the set).
Add the numbers in the example data set: 20+24+25+36+25+22+23=175.
Divide by the number of data points in the set. This set has seven values so divide by 7.
Insert the values into the formula to calculate the mean. The mean equals the sum of the values (175) divided by the number of data points (7). Since 175÷7=25, the mean of this data set equals 25. Not all mean values will equal a whole number.
The median identifies the midpoint or middle value of a set of numbers.
Finding Center Value
Put the numbers in order from smallest to largest. Use the example set of values: 20, 24, 25, 36, 25, 22, 23. Placed in order, the set becomes: 20, 22, 23, 24, 25, 25, 36.
Since this set of numbers has seven values, the median or value in the center is 24.
If the set of numbers has an even number of values, calculate the average of the two center values. For example, suppose the set of numbers contains the values 22, 23, 25, 26. The middle lies between 23 and 25. Adding 23 and 25 yields 48. Dividing 48 by two gives a median value of 24.
The mode identifies the most common value or values in the data set. Depending on the data, there might be one or more modes, or no mode at all.
Like finding the median, order the data set from smallest to largest. In the example set, the ordered values become: 20, 22, 23, 24, 25, 25, 36.
A mode occurs when values repeat. In the example set, the value 25 occurs twice. No other numbers repeat. Therefore, the mode is the value 25.
In some data sets, more than one mode occurs. The data set 22, 23, 23, 24, 27, 27, 29 contains two modes, one each at 23 and 27. Other data sets may have more than two modes, may have modes with more than two numbers (as 23, 23, 24, 24, 24, 28, 29: mode equals 24) or may not have any modes at all (as 21, 23, 24, 25, 26, 27, 29). The mode may occur anywhere in the data set, not just in the middle.
Range shows the mathematical distance between the lowest and highest values in the data set. Range measures the variability of the data set. A wide range indicates greater variability in the data, or perhaps a single outlier far from the rest of the data. Outliers may skew, or shift, the mean value enough to impact data analysis.
Identifying Low and High Values
Evaluating the Range
In the sample group, the lowest value is 20 and the highest value is 36.
To calculate range, subtract the lowest value from the highest value. Since 36-20=16, the range equals 16.
In the sample set, the high data value of 36 exceeds the previous value, 25, by 11. This value seems extreme, given the other values in the set. The value of 36 might be an outlier data point.
Calculating Standard Deviation
Standard deviation measures the variability of the data set. Like range, a smaller standard deviation indicates less variability.
Calculating the Mean
Squaring the Difference
Adding the Squared Differences
Division by N-1
Evaluating Standard Deviation
Finding standard deviation requires summing the squared difference between each data point and the mean [∑(x-µ)2], adding all the squares, dividing that sum by one less than the number of values (N-1), and finally calculating the square root of the dividend. Mathematically, start with calculating the mean.
Calculate the mean by adding all the data point values, then dividing by the number of data points. In the sample data set, 20+24+25+36+25+22+23=175. Divide the sum, 175, by the number of data points, 7, or 175÷7=25. The mean equals 25.
Next, subtract the mean from each data point, then square each difference. The formula looks like this: ∑(x-µ)2, where ∑ means sum, x represents each data set value and µ represents the mean value. Continuing with the example set, the values become: 20-25=-5 and -52=25; 24-25=-1 and -12=1; 25-25=0 and 02=0; 36-25=11 and 112=121; 25-25=0 and 02=0; 22-25=-3 and -32=9; and 23-25=-2 and -22=4.
Adding the squared differences yields: 25+1+0+121+0+9+4=160.
Divide the sum of the squared differences by one less than the number of data points. The example data set has 7 values, so N-1 equals 7-1=6. The sum of the squared differences, 160, divided by 6 equals approximately 26.6667.
Calculate the standard deviation by finding the square root of the division by N-1. In the example, the square root of 26.6667 equals approximately 5.164. Therefore, the standard deviation equals approximately 5.164.
Standard deviation helps evaluate data. Numbers in the data set that fall within one standard deviation of the mean are part of the data set. Numbers that fall outside of two standard deviations are extreme values or outliers. In the example set, the value 36 lies more than two standard deviations from the mean, so 36 is an outlier. Outliers may represent erroneous data or may suggest unforeseen circumstances and should be carefully considered when interpreting data.