What Are Gaps, Clusters & Outliers in Math?

••• Sitthiphong/iStock/GettyImages

Business, government and academic activities almost always require the collection and analysis of data. One of the ways to represent numerical data is through graphs, histograms and charts. These visualization techniques allow people to gain better insight into problems and devise solutions. Gaps, clusters and outliers are characteristics of data sets that influence mathematical analysis and are readily visible on visual representations.

Holes in the Data

Gaps refer to missing areas in a data set. For example, if a scientific experiment collects temperature data in the range of 50 degrees Fahrenheit to 100 degrees Fahrenheit, but nothing between 70 and 80 degrees, that would represent a gap in the data set. A line plot of this data set would have "x" marks for temperatures between 50 and 70 and again between 80 and 100, but there would be nothing between 70 and 80. Researchers can dig deeper and explore why certain data points do not show up in a collected sample.

Isolated Groups

Clusters are isolated groups of data points. Line plots, which are one of the ways to represent data sets, are lines with "x" marks placed above specific numbers to depict their frequency of occurrence in the data set. A cluster is depicted as a collection of these "x" marks in a small interval or data subset. For example, if the exam scores for a class of 10 students are 74, 75, 80, 72, 74, 75, 76, 86, 88 and 73, the most "x" marks on a line plot would be in the 72-to-76 score interval. This would represent a data cluster. Note the frequency for 74 and 75 is two, but for all other scores, it is one.

At the Extremes

Outliers are extreme values -- data points that lie significantly outside other values in a data set. An outlier must be significantly less than or greater than the majority of numbers in a data set. The definition of "extreme" depends on the circumstance and a consensus of the analysts involved in the research. Outliers might be bad data points, also known as noise, or they might contain valuable information about the phenomenon being investigated and the data collection methodology itself. For example, if class scores are mostly in the 70-to-80 range, but a couple of scores are in the low 50s, those might represent outliers.

Putting it All Together

Gaps, outliers and clusters in data sets can impact the results of mathematical analysis. Gaps and clusters might represent errors in the data collection methodology. For example, if a telephone survey polls only certain area codes, such as low-income housing complexes or high-end suburban residential areas, and not a broad cross-section of the population, chances are there will be gaps and clusters in the data. Outliers can skew the mean or average value of a data set. For example, the mean or average value of a data set consisting of four numbers -- 50, 55, 65 and 90 -- is 65. Without the outlier 90, however, the mean is about 57.

Related Articles

How to Calculate Standard Errors
Statistical Analysis Tools
How to Calculate Statistical Mean
How to Calculate Valid Percent
How Do You Find a Cluster in a Line Plot?
How to Calculate a Confidence Interval
How to Find the Centroid in a Clustering Analysis
How to Calculate CV Values
The Relationship Between Standard Deviations & Percentiles
How to Calculate Unexplained Variance
Definition of Mean, Median & Mode
How to Calculate Outliers
How to Calculate Percentage of Non Overlapping Data...
How to Calculate a P-Value
Is a Median More Accurate Than a Mean?
Can You Use a T-Test on Ranked Data?
How to Report Z-Score Results
How to Find the Mean, Median, Mode, Range, and Standard...
Explain the Mean, Mode & Median
How to Calculate the Coefficient of Variation

Dont Go!

We Have More Great Sciencing Articles!