Cluster analysis and factor analysis are two statistical methods of data analysis. These two forms of analysis are heavily used in the natural and behavior sciences. Both cluster analysis and factor analysis allow the user to group parts of the data into "clusters" or onto "factors," depending on the type of analysis. Some researchers new to the methods of cluster and factor analyses may feel that these two types of analysis are similar overall. While cluster analysis and factor analysis seem similar on the surface, they differ in many ways, including in their overall objectives and applications.
Cluster analysis and factor analysis have different objectives. The usual objective of factor analysis is to explain correlation in a set of data and relate variables to each other, while the objective of cluster analysis is to address heterogeneity in each set of data. In spirit, cluster analysis is a form of categorization, whereas factor analysis is a form of simplification.
Complexity is one question on which factor analysis and cluster analysis differ: data size affects each analysis differently. As the set of data grows, cluster analysis becomes computationally intractable. This is true because the number of data points in cluster analysis is directly related to the number of possible cluster solutions. For example, the number of ways to divide twenty objects into 4 clusters of equal size is over 488 million. This makes direct computational methods, including the category of methods to which factor analysis belongs, impossible.
Even though the solutions to both factor analysis and cluster analysis problems are subjective to some degree, factor analysis allows a researcher to yield a “best” solution, in the sense that the researcher can optimize a certain aspect of the solution (orthogonality, ease of interpretation and so on). This is not so for cluster analysis, since all algorithms that could possibly yield a best cluster analysis solution are computationally inefficient. Hence, researchers employing cluster analysis cannot guarantee an optimal solution.
Factor analysis and cluster analysis differ in how they are applied to real data. Because factor analysis has the ability to reduce a unwieldy set of variables to a much smaller set of factors, it is suitable for simplifying complex models. Factor analysis also has a confirmatory use, in which the researcher can develop a set of hypotheses regarding how variables in the data are related. The researcher can then run factor analysis on the data set to confirm or deny these hypotheses. Cluster analysis, on the other hand, is suitable for classifying objects according to certain criteria. For example, a researcher can measure certain aspects of a group of newly-discovered plants and place these plants into species categories by employing cluster analysis.
- “Analyzing Multivariate Data”; James Lattin, et al.; 2003
- “The Essentials of Factor Analysis”; Dennis Child; 2006
- woman reading business statistics image by forca from Fotolia.com