Finding the strength of the association between two variables is an important skill for scientists of all types. If two variables are correlated with each other, it shows that there is a link between them. A positive correlation means that when one variable increases, the other one does too, and a negative correlation means that when one variable increases, the other one decreases. Correlations don’t prove causation, although it is possible that further tests will prove a causal relationship between the variables. The correlation coefficient R shows the strength of the relationship between the two variables, and whether it’s a positive or a negative correlation.
TL;DR (Too Long; Didn't Read)
Call one variable x and one variable y. Calculate the value of R using the formula:
R = [n(Σxy) – (Σx) (Σy)] ÷ √{[n Σx2− (Σx)2] [n Σy2− (Σy)2]}
Where n is your sample size.
Make a table of your data. This should include one column for the participant number, one column for the first variable (labeled x) and one column for the second variable (labeled y). For example, if you’re looking to see whether there is a correlation between height and shoe size, one column would identify each person you measure, one column would show each person’s height and another would show their shoe size. Make three additional columns, one for xy, one for x2 and one for y2.
Use your data to fill out the three additional columns. For example, imagine your first person measures 75 inches tall and has size 12 feet. The x (height) column would show 75, and the y (shoe size) column would show 12. You need to find xy, x2 and y2. So using this example:
xy = 75 × 12 = 900
x2 = 752 = 5,625
y2 = 122 = 144
Complete these calculations for every person for whom you have data.
Create a new row at the bottom of your table for the sums of each column. Add together all of the x values, all of the y values, all of the xy values, all of the x2 values and all of the y2 values, and then put the results at the bottom of the corresponding column in your new row. You can label your new row “sum” or use a sigma (Σ) symbol.
You find R from your data using the formula:
R = [n(Σxy) – (Σx) (Σy)] ÷ √{[nΣx2− (Σx)2] [nΣy2− (Σy)2]}
This looks a bit daunting, so you can split it into two parts, which we’ll call s and t.
s = n(Σxy) – (Σx) (Σy)
t = √{[n Σx2− (Σx)2] [n Σy2− (Σy)2]}
In these equations, n is the number of participants you have (your sample size). The rest of the parts of the equation are the sums you calculated in the last step. So for s, multiply the size of your sample by the sum of the xy column, and then subtract the sum of the x column multiplied by the sum of the y column from this.
For t, there are four main steps. First, calculate n multiplied by the sum of your x2 column, and then subtract the sum of your x column squared (multiplied by itself) from this value. Second, do exactly the same thing but with the sum of the y2 column and the sum of the y column squared in place of the x parts (i.e., n × Σy2 – [Σy × Σy]). Third, multiply these two results (for the xs and ys) together. Fourth, take the square root of this answer.
If you’ve worked in parts, you can calculate R as simply R = s ÷ t. You will get an answer between −1 and 1. A positive answer shows a positive correlation, with anything over 0.7 generally being considered a strong relationship. A negative answer shows a negative correlation, with anything over −0.7 considered a strong negative relationship. Similarly ± 0.5 is considered a moderate relationship and ±0.3 is considered a weak relationship. Anything close to 0 shows a lack of correlation.
References
Tips
- Most graphing programs report the square of the correlation coefficient, or R^2, instead of R. To obtain this value, simply multiply R by itself.
About the Author
Lee Johnson is a freelance writer and science enthusiast, with a passion for distilling complex concepts into simple, digestible language. He's written about science for several websites including eHow UK and WiseGeek, mainly covering physics and astronomy. He was also a science blogger for Elements Behavioral Health's blog network for five years. He studied physics at the Open University and graduated in 2018.