Data, especially numerical data, is a powerful tool to have if you know what to do with it; graphs are one way to present data or information in an organized manner, provided the kind of data you're working with lends itself to the kind of analysis you need.
Often, statisticians, instructors and others are curious about the distribution of data. For example, if the data is a set of chemistry test results, you might be curious about the difference between the lowest and the highest scores or about the fraction of test-takers occupying the various "slots" between these extremes.
Frequency distributions are a powerful tool for scientists, especially (but not only) when the data tends to cluster around a mean or average smack-dab between the right and left sides of the graph. This is the familiar "bell-shaped curve" of normally distributed data.
What Is a Frequency Distribution?
A frequency distribution is a table that includes intervals of data points, called classes, and the total number of entries in each class. The frequency f of each class is just the number of data points it has. The limiting points of each class are called the lower class limit and the upper class limit, and the class width is the distance between the lower (or higher) limits of successive classes. It is not the difference between the higher and lower limits of the same class.
The range is the difference between the lowest and highest values in the table or on its corresponding graph.
When creating a grouped frequency distribution, you start with the principle that you will use between five and 20 classes. These classes must have the same width, or span or numerical value, for the distribution to be valid. Once you determine the class width (detailed below), you choose a starting point the same as or less than the lowest value in the whole set.
General Guidelines for Determining Classes
As noted, choose between five and 20 classes; you would usually use more classes for a larger number of data points, a wider range or both. In addition, follow these guidelines:
- The class width should be an odd number. This will assure that the class midpoints are integer numbers rather than decimal numbers.
- Every data value must fall into exactly one class. None are ignored, and none can be included in more than one class.
- The classes must be continuous, meaning that you have to include even those classes that have no entries. (Exceptions are made at the extremes; if you are left with an empty first or an empty last class class, exclude it).
- As stated, the classes must be equal in width. The first and last classes are again exceptions, as these can be, for example, any value below a certain number at the low end or any value above a certain number at the high end,
In a properly constructed frequency distribution, the starting point plus the number of classes times the class width must always be greater than the maximum value.
Class Width Examples
A professor had students keep track of their social interactions for a week. The number of social interactions over the week is shown in the following grouped frequency distribution. What is the class midpoint for each class?
Class Frequency (f)
- 0–7: 7
- 8–14: 37
- 15–21: 32
- 22–28: 21
- 29–35: 3
The class width was chosen in this instance to be seven. Given a range of 35 and the need for an odd number for class width, you get five classes with a range of seven. The midpoints are 4, 11, 18, 25 and 32.