When scientists and mathematicians plot x,y data on a graph, they need to understand the extent to which the data values correlate to each other. That is, if the values of x between any two x,y data points increase, do the values of y increase by the same factor? To aid in this determination, the scientist will frequently calculate a correlation coefficient, abbreviated R. (See References 1). The correlation coefficient operates on a scale of -1.00 to 1.00, where 1.00 and -1.00 represent a perfectly linear correlation and a value of zero indicates no correlation. In more practical terms, the correlation coefficient indicates the extent to which data points deviate from the “best fit” line drawn between the points. (See References 2).
Calculate the average value of x and y by summing all values of x and y and dividing by the number of data points. As an example, consider a scatter plot with three (x,y) data points: (0,1), (2,3) and (5,6). The x values 0, 2 and 5 average to (0 + 2 + 5) / 3 = 2.33. The y values 1, 3 and 7 average to (1 + 3 + 7) / 3 = 3.67.
Calculate the standard deviation of the x and y data points, Sx and Sy, by first calculating the absolute value of the difference between each data point and the average, then squaring these values, averaging the squared values, and finally taking the square root. (See References 3). Continuing the example from step 1, the x values of 0, 2 and 5 give deviations of |0 - 2.33|, |2 - 2.33| and |5 - 2.33|, or 2.33, 0.33 and 2.67. Squaring each of these values gives 5.43, 0.11 and 7.13. The squared values average to 4.22, and taking the square root of this number gives 2.05. Hence, the standard deviation for x, or Sx, is 2.055 and Sy is 2.494.
Find the slope equation of the linear regression or “best fit” line drawn through the data. Some software graphing programs will perform the linear regression and display the equation on the graph in the form y = mx + b, where m represents the slope and b represents the y-intercept. If the equation of the best-fit line is not available, choose any two points on the line and label them x1,y1 and x2,y2. Then calculate the slope, m, by m = (y2 - y1) / (x2 - x1). In the case of the sample data from step 1, the slope is 1.2105.
Most graphing programs report the square of the correlation coefficient, or R^2, instead of R. To obtain this value, simply multiply R by itself.