Finding the strength of the association between two variables is an important skill for scientists of all types. If two variables are correlated with each other, it shows that there is a link between them. A positive correlation means that when one variable increases, the other one does too, and a negative correlation means that when one variable increases, the other one decreases. Correlations don’t prove causation, although it is possible that further tests will prove a causal relationship between the variables. The correlation coefficient **R** shows the strength of the relationship between the two variables, and whether it’s a positive or a negative correlation.

#### TL;DR (Too Long; Didn't Read)

Call one variable **x** and one variable **y**. Calculate the value of **R** using the formula:

**R = [n(Σxy) – (Σx) (Σy)] ÷ √{[n Σx ^{2}− (Σx)^{2}] [n Σy^{2}− (Σy)^{2}]}**

Where **n** is your sample size.

Make a table of your data. This should include one column for the participant number, one column for the first variable (labeled **x**) and one column for the second variable (labeled **y**). For example, if you’re looking to see whether there is a correlation between height and shoe size, one column would identify each person you measure, one column would show each person’s height and another would show their shoe size. Make three additional columns, one for **xy**, one for **x ^{2}** and one for

**y**.

^{2}Use your data to fill out the three additional columns. For example, imagine your first person measures 75 inches tall and has size 12 feet. The **x** (height) column would show 75, and the **y** (shoe size) column would show 12. You need to find **xy**, **x ^{2}** and

**y**. So using this example:

^{2}## Sciencing Video Vault

**xy = 75 × 12 = 900**

**x ^{2} = 75^{2} = 5,625**

**y ^{2} = 12^{2} = 144**

Complete these calculations for every person for whom you have data.

Create a new row at the bottom of your table for the sums of each column. Add together all of the **x** values, all of the **y** values, all of the **xy** values, all of the **x ^{2}** values and all of the

**y**values, and then put the results at the bottom of the corresponding column in your new row. You can label your new row “sum” or use a sigma (Σ) symbol.

^{2}You find **R** from your data using the formula:

**R = [n(Σxy) – (Σx) (Σy)] ÷ √{[nΣx ^{2}− (Σx)^{2}] [nΣy^{2}− (Σy)^{2}]}**

This looks a bit daunting, so you can split it into two parts, which we’ll call **s** and **t**.

**s = n(Σxy) – (Σx) (Σy)**

**t = √{[n Σx ^{2}− (Σx)^{2}] [n Σy^{2}− (Σy)^{2}]}**

In these equations, **n** is the number of participants you have (your sample size). The rest of the parts of the equation are the sums you calculated in the last step. So for **s**, multiply the size of your sample by the sum of the **xy** column, and then subtract the sum of the **x** column multiplied by the sum of the **y** column from this.

For **t**, there are four main steps. First, calculate **n** multiplied by the sum of your **x ^{2}** column, and then subtract the sum of your

**x**column squared (multiplied by itself) from this value. Second, do exactly the same thing but with the sum of the

**y**column and the sum of the

^{2}**y**column squared in place of the

**x**parts (i.e., n × Σy

^{2}– [Σy × Σy]). Third, multiply these two results (for the

**x**s and

**y**s) together. Fourth, take the square root of this answer.

If you’ve worked in parts, you can calculate **R** as simply **R = s ÷ t**. You will get an answer between −1 and 1. A positive answer shows a positive correlation, with anything over 0.7 generally being considered a strong relationship. A negative answer shows a negative correlation, with anything over −0.7 considered a strong negative relationship. Similarly ± 0.5 is considered a moderate relationship and ±0.3 is considered a weak relationship. Anything close to 0 shows a lack of correlation.