An outlier is a value in a data set that is far from the other values. Outliers can be caused by experimental or measurement errors, or by a long-tailed population. In the former cases, it can be desirable to identify outliers and remove them from data before performing a statistical analysis, because they can throw off the results so that they do not accurately represent the sample population. The simplest way to identify outliers is with the quartile method.
Extreme outliers are more indicative of a bad data point than a mild outlier.
Sort the data in ascending order. For example take the data set {4, 5, 2, 3, 15, 3, 3, 5}. Sorted, the example data set is {2, 3, 3, 3, 4, 5, 5, 15}.
Find the median. This is the number at which half the data points are larger and half are smaller. If there are an even number of data points, the middle two are averaged. For the example data set, the middle points are 3 and 4, so the median is (3 + 4) / 2 = 3.5.
Find the upper quartile, Q2; this is the data point at which 25 percent of the data are larger. If the data set is even, average the 2 points around the quartile. For the example data set, this is (5 + 5) / 2 = 5.
Find the lower quartile, Q1; this is the data point at which 25 percent of the data are smaller. If the data set is even, average the 2 points around the quartile. For the example data, (3 + 3) / 2 = 3.
Subtract the lower quartile from the higher quartile to get the interquartile range, IQ. For the example data set, Q2 – Q1 = 5 – 3 = 2.
Multiply the interquartile range by 1.5. Add this to the upper quartile and subtract it from the lower quartile. Any data point outside these values is a mild outlier. For the example set, 1.5 x 2 = 3; thus 3 – 3 = 0 and 5 + 3 = 8. So any value less than 0 or greater than 8 would be a mild outlier. This means that 15 qualifies as a mild outlier.
Multiply the interquartile range by 3. Add this to the upper quartile and subtract it from the lower quartile. Any data point outside these values is an extreme outlier. For the example set, 3 x 2 = 6; thus 3 – 6 = –3 and 5 + 6 = 11. So any value less than –3 or greater than 11 would be a extreme outlier. This means that 15 qualifies as an extreme outlier.
Tips
References
Tips
- Extreme outliers are more indicative of a bad data point than a mild outlier.
About the Author
Kaylee Finn began writing professionally for various websites in 2009, primarily contributing articles covering topics in business personal finance. She brings expertise in the areas of taxes, student loans and debt management to her writing. She received her Bachelor of Science in system dynamics from Worcester Polytechnic Institute.
Photo Credits
class room board image by Alhazm Salemi from Fotolia.com