Any data set accumulated for statistical purposes, such as the U.S. census data, contains information that requires summation and aggregation. It is almost impossible to list the attributes, for example, as individual incomes and family sizes. Statisticians use frequency distribution graphs to depict the data in a comprehensive manner. For example, a histogram divides data into class intervals and counts the frequency at which all the members belonging to that class interval occur. Although there are no strict rules on how to calculate the size and number of class intervals, there are some useful conventional criteria.
Calculate Range of Data
Calculate the range of data, i.e., the difference between the highest and lowest data points. For example, assume the highest paid individual in the U.S. earns $30 billion a year and the lowest earns zero. The range is equal to 30 - 0, which equals $30 billion.
Determine Number of Classes
Determine the number of classes from the sample size. As a rule of thumb, five to seven classes are used for sample size up to 50, eight to 10 classes for sample size between 50 and 100, 10 to 15 classes for sample size between 100 and 250 and 15 to 20 classes for sample size greater than 250.
Apply Class Interval Formula
Calculate the class interval using the following formula: Class interval = range ÷ number of classes. If you have 15 classes of income in the distribution of income example, work out 30 ÷ 15 = $2 billion. Often, statisticians ignore extremely high and low figures and focus on the midrange frequencies. For this reason, income distribution in the U.S. is presented in smaller intervals of $10,000 with incomes greater than a certain figure, usually a million, lumped together in a single class interval.
Use your discretion when calculating class interval. The holy grail of a graph such as a histogram is to convey relevant information in a meaningful and simple way. Choose your class intervals to convey the information you deem worthy of readers' attention.