How To Find The Correlation Coefficient For 'R' In A Scatter Plot

By Lee Johnson Updated: Feb. 28, 2025 3:15 pm EST

Glasses, coffe, and scatter plot on blue background

marekuliasz/Shutterstock

Finding the strength of the association between two variables is an important skill for scientists of all types. If two variables are correlated with each other, it shows that there is a link between them. A positive correlation means that when one variable increases, the other one does too, and a negative correlation means that when one variable increases, the other one decreases. Correlations don't prove causation, although it is possible that further tests will prove a causal relationship between the variables. The correlation coefficient R shows the strength of the relationship between the two variables, and whether it's a positive or a negative correlation.

1. Make a table of your data

Make a table of your data. This should include one column for the participant number, one column for the first variable (labeled x), and one column for the second variable (labeled y). For example, if you're looking to see whether there is a correlation between height and shoe size, one column would identify each person you measure, one column would show each person's height, and another would show their shoe size. Make three additional columns, one for xy, one for x², and one for y².

2. Calculate the values for the empty columns

Use your data to fill out the three additional columns. For example, imagine your first person measures 75 inches tall and has size-12 feet. The x (height) column would show 75, and the y (shoe size) column would show 12. You need to find xy, x², and y². So using this example: xy = 75 × 12 = 900, x² = 75² = 5,625, and y² = 12² = 144.

Complete these calculations for every person for whom you have data.

3. Find the sum of each column

Create a new row at the bottom of your table for the sums of each column. Add together all of the x values, all of the y values, all of the xy values, all of the x² values, and all of the y² values, and then put the results at the bottom of the corresponding column in your new row. You can label your new row "sum" or use a sigma (Σ) symbol.

4. Calculate R using the formula

You find R from your data using the formula: R = [n(Σxy) – (Σx) (Σy)] ÷ √{[nΣx²− (Σx)²] [nΣy²− (Σy)²]}. This looks a bit daunting, so you can split it into two parts, which can be called s and t: s = n(Σxy) – (Σx) (Σy) and t = √{[n Σx²− (Σx)²] [n Σy²− (Σy)²]}.

In these equations, n is the number of participants you have (your sample size). The rest of the parts of the equation are the sums you calculated in the last step. So for s, multiply the size of your sample by the sum of the xy column, and then subtract the sum of the x column multiplied by the sum of the y column from this.

For t, there are four main steps. First, calculate n multiplied by the sum of your x² column, and then subtract the sum of your x column squared from this value. Second, do exactly the same thing but with the sum of the y² column and the sum of the y column squared in place of the x parts (i.e., n × Σy² – [Σy × Σy]). Third, multiply these two results (for the x's and y's) together. Fourth, take the square root of this answer.

If you've worked in parts, you can calculate R as simply R = s ÷ t. You will get an answer between −1 and 1. A positive answer shows a positive correlation, with anything over 0.7 generally being considered a strong relationship. A negative answer shows a negative correlation, with anything over −0.7 considered a strong negative relationship. Similarly ± 0.5 is considered a moderate relationship and ±0.3 is considered a weak relationship. Anything close to 0 shows a lack of correlation.

How To Find The Correlation Coefficient For 'R' In A Scatter Plot

1. Make a table of your data

2. Calculate the values for the empty columns

3. Find the sum of each column

4. Calculate R using the formula

References

Recommended