A scatter plot is an important diagnostic tool in a statistician’s arsenal, obtained by graphing two variables against each other. It allows the statistician to eyeball the variables and form a working hypothesis about their relationship. For this reason, it is usually drawn before a regression analysis is carried out. The statistician subsequently tests the hypothesis using a regression analysis and determine the sign and precise magnitude of the relationship. Furthermore, a scatter plot helps identify outliers — values that are abnormally distant from most of the data in the sample. Eliminating outliers helps improve the regression model.
Check for negative relationship between the two variables in the scatter plot. If low values of the first variable correspond with high values of the second variable, there is a negative correlation. In this case, a line drawn through the data points has a negative slope.
Examine the scatter plot for positive relationship between the variables. If low values of the first variable in the scatter plot correspond with low values of the second, and the high values of the first similarly correspond with the high values of the second, the variables have a positive correlation. In this case, a line drawn through the data points has a positive slope.
Inspect the scatter plot for no relationship between the variables. If the data points in the scatter plot are distributed randomly with no apparent relationship between the two, they have either no correlation, or small, statistically insignificant correlation. In this case, a line drawn through the data points is horizontal with slope equal to zero.
Fit a line through the data points and examine its shape to gauge the nature of relationship between the two variables. A straight line is interpreted as a linear relationship, a curved shape suggests a quadratic relationship, and a line that lies relatively flat before suddenly shooting up or down is interpreted as an exponential relationship.
Examine the scatter plot for outliers, values that lie abnormally far from the cluster of data points. Outliers distort the relationship between the variables. Eliminate them, but only if their absence does not affect the analysis of relationship between the two variables.
About the Author
Kiran Gaunle is a freelancer based in New York. He started writing professionally in 2006. He has written research reports for the UN Development Programme and the "Kathmandu Post." Gaunle is working on a book of short stories and a novel. He holds a Master of Arts in international political economy and development from Fordham University.