Any data set accumulated for statistical purposes, such as the U.S. census, contains information that needs to be summarized and aggregated. It’d be impossible to list the attributes such as income and family size for each individual. Statisticians, therefore, use frequency distribution graphs to depict the data in a comprehensive manner, such as a histogram, which divides data into class intervals and counts the frequency at which all the members belonging to that class interval occur. Although there are no strict rules on how to calculate the size and number of class intervals, there are some conventional criteria that are useful.
Calculate the range of data. The range is the difference between the highest and lowest data points. For example, assume the highest paid individual in the U.S. earns $30 billion a year and the lowest earns zero. The range is equal to 30 - 0, which equals $30 billion.
Determine the number of classes from the sample size. As a rule of thumb, five to seven classes are used for sample size up to 50, eight to 10 classes for sample size between 50 and 100, 10 to 15 classes for sample size between 100 and 250 and 15 to 20 classes for sample size greater than 250.
Calculate the class interval using the following formula: Class interval = range / number of classes. To calculate class interval for distribution of income in the example, divide 30 by 15, which equals $2 billion. Often, statisticians ignore extremely high and low figures and focus on the midrange frequencies. For this reason, income distribution in the U.S. is presented in smaller intervals of $10,000 with incomes greater than a certain figure, usually a million, lumped together in a single class interval.
Use your discretion when calculating class interval. The holy grail of a graph such as a histogram is to convey relevant information in a meaningful and simple way. Choose your class intervals to convey the information you deem worthy of readers' attention.