# How to Find the Correlation Coefficient for 'R' in a Scatter Plot

••• marekuliasz/iStock/GettyImages
Print

Finding the strength of the association between two variables is an important skill for scientists of all types. If two variables are correlated with each other, it shows that there is a link between them. A positive correlation means that when one variable increases, the other one does too, and a negative correlation means that when one variable increases, the other one decreases. Correlations don’t prove causation, although it is possible that further tests will prove a causal relationship between the variables. The correlation coefficient R shows the strength of the relationship between the two variables, and whether it’s a positive or a negative correlation.

#### TL;DR (Too Long; Didn't Read)

Call one variable x and one variable y. Calculate the value of R using the formula:

R = [n(Σxy) – (Σx) (Σy)] ÷ √{[n Σx2− (Σx)2] [n Σy2− (Σy)2]}

Where n is your sample size.

1. ## Make a Table of Your Data

2. Make a table of your data. This should include one column for the participant number, one column for the first variable (labeled x) and one column for the second variable (labeled y). For example, if you’re looking to see whether there is a correlation between height and shoe size, one column would identify each person you measure, one column would show each person’s height and another would show their shoe size. Make three additional columns, one for xy, one for x2 and one for y2.

3. ## Calculate the Values for the Empty Columns

4. Use your data to fill out the three additional columns. For example, imagine your first person measures 75 inches tall and has size 12 feet. The x (height) column would show 75, and the y (shoe size) column would show 12. You need to find xy, x2 and y2. So using this example:

xy = 75 × 12 = 900

x2 = 752 = 5,625

y2 = 122 = 144

Complete these calculations for every person for whom you have data.

5. ## Find the Sum of Each Column

6. Create a new row at the bottom of your table for the sums of each column. Add together all of the x values, all of the y values, all of the xy values, all of the x2 values and all of the y2 values, and then put the results at the bottom of the corresponding column in your new row. You can label your new row “sum” or use a sigma (Σ) symbol.

7. ## Calculate R Using the Formula

8. You find R from your data using the formula:

R = [n(Σxy) – (Σx) (Σy)] ÷ √{[nΣx2− (Σx)2] [nΣy2− (Σy)2]}

This looks a bit daunting, so you can split it into two parts, which we’ll call s and t.

s = n(Σxy) – (Σx) (Σy)

t = √{[n Σx2− (Σx)2] [n Σy2− (Σy)2]}

In these equations, n is the number of participants you have (your sample size). The rest of the parts of the equation are the sums you calculated in the last step. So for s, multiply the size of your sample by the sum of the xy column, and then subtract the sum of the x column multiplied by the sum of the y column from this.

For t, there are four main steps. First, calculate n multiplied by the sum of your x2 column, and then subtract the sum of your x column squared (multiplied by itself) from this value. Second, do exactly the same thing but with the sum of the y2 column and the sum of the y column squared in place of the x parts (i.e., n × Σy2 – [Σy × Σy]). Third, multiply these two results (for the xs and ys) together. Fourth, take the square root of this answer.

If you’ve worked in parts, you can calculate R as simply R = s ÷ t. You will get an answer between −1 and 1. A positive answer shows a positive correlation, with anything over 0.7 generally being considered a strong relationship. A negative answer shows a negative correlation, with anything over −0.7 considered a strong negative relationship. Similarly ± 0.5 is considered a moderate relationship and ±0.3 is considered a weak relationship. Anything close to 0 shows a lack of correlation.