The strongest way to show how two variables are associated – like study time and course success – is the correlation. Varying from +1.0 to -1.0, the correlation demonstrates exactly how one variable changes as the other one does.

For some research questions, one of the variables is continuous, such as the number of hours a student studies for an examination, which can range from 0 to over 90 hours weekly. The other variable is dichotomous, such as, did this student pass the exam, or not? In situations like this, you must calculate the point-biserial correlation.

## Preparation

Arrange your data in a table with three columns, either on paper or on a computer spreadsheet: Case Number (such as “Student #1,” “Student #2,” and so forth), Variable X (such as “Total Hours Studied”) and Variable Y (like “Passed Exam”). For any given case, Variable Y will be equal to either 1 (this student passed the exam) or 0 (the student failed). You may use for this step.

Remove outlier data. For example, if four-fifths of the students studied between 3 and 10 hours for the exam, throw out data from students who did not study at all, or who studied over 20 hours.

Count your cases to verify that you have enough to calculate a statistically significant and sufficiently powerful correlation. If you do not have at least 25 to 70 cases, it is not worth calculating a correlation.

Have two different people make the same data table independently, and see if there are any differences. Resolve any discrepancies before proceeding with the calculations.

## Calculation

Print out all these steps. Write down the value of every result you get at each step in the “Calculate” section right next to the step.

Calculate this once, then take a break and calculate the correlation again. If you have a serious discrepancy, there's been a mistake or two somewhere along the line.

See Cohen’s “Power Primer” for information about statistically significant and sufficiently powerful correlation (see References).

Your result must fit into the range between +1.0 and -1.0, inclusive. Values like +0.45 or -0.22 are fine. Values like 16.4 or -32.6 are mathematically impossible; if you get something like this, you have made a mistake somewhere.

Follow Step 3 precisely. Do not subtract the result of Step 1 from the result of Step 2.

Calculate the average of the values of Variable X where Y = 1. That is, for all cases where Y = 1, add up the values of Variable X, and divide by the number of those cases. In our example, this is the average total hours studied for students who passed the exam; let’s say it’s 10.

Calculate the average of the values of Variable X where Y = 0. That is, for all cases where Y = 0, add up the values of Variable X, and divide by the number of those cases. Here, this is the average total hours studied for students who failed; let’s say it’s 3.

Subtract the result of Step 2 from Step 1. Here, 10 – 3 = 7.

Multiply the number of cases you used in Step 1 times the number of cases you used in Step 2. If 40 students passed the exam,and 20 failed, this is 40 x 20 = 800.

Multiply the total number of cases by one less than that number. Here, 60 students total took the exam, so this figure is 60 x 59 = 3,540.

Divide the result from Step 4 and by the result from Step 5. Here, 800 / 3540 = 0.226.

Calculate the square root of the result of Step 6, using a calculator or a computer spreadsheet. Here, that would be 0.475.

Square each value of Variable X, and add up all the squares.

Multiply the result of Step 8 by the number of all the cases. Here, you would multiply the result of Step 8 by 60.

Add up the sum of Variable X over all the cases. So, you would add up all the total hours studied in the entire sample.

Square the result from Step 10.

Subtract the result of Step 11 from the result of Step 9.

Divide the result of Step 12 by the result of Step 5.

Calculate the square root of the result of Step 13, using a calculator or a computer spreadsheet.

Divide the result of Step 3 by the result of Step 14.

Multiply the result of Step 15 by the result of Step 7. This is the value of the point-biserial correlation.

#### Tips

#### Warnings

Tips

- Print out all these steps. Write down the value of every result you get at each step in the “Calculate” section right next to the step.
- Calculate this once, then take a break and calculate the correlation again. If you have a serious discrepancy, there's been a mistake or two somewhere along the line.
- See Cohen’s “Power Primer” for information about statistically significant and sufficiently powerful correlation (see References).

About the Author

Based in New York City, Mark Koltko-Rivera has been writing psychology-related articles since 1987. His articles have appeared in such journals as “Psychotherapy” and “Journal of Humanistic Psychology.” Koltko-Rivera is a Fellow of the American Psychological Association. He holds a Doctor of Philosophy in counseling psychology from New York University.

Photo Credits

Calculator image by Alhazm Salemi from Fotolia.com