At one time or another, you've probably used spreadsheet programs to find the best linear equation that fits a given set of data points -- an operation called simple linear regression. If you've ever wondered exactly how the spreadsheet program completes the calculation, then don't worry, it's not magic. You can actually find the line of best fit yourself without a spreadsheet program by just plugging in numbers using your calculator. Unfortunately, the formula is complicated, but it can be broken down into easy, manageable steps.
Prepare the Data
Compile your data into a table. Write the x-values in one column and y-values in another. Determine how many rows, e.g., how many data points or x,y values, you have in your table.
Add two more columns to the table. Designate one column as "x squared" and the other as "xy," for x times y.
Fill in the x-squared column by multiplying each value of x times itself, or squaring it. For example, 2 squared is 4, because 2 x 2 = 4.
Fill in the xy column by multiplying each value of x against the corresponding value of y. If x is 10 and y is 3, then 10 x 3 = 30.
Add up all the numbers in the x column and write the sum down at the bottom of the x column. Do the same for the other three columns. You will now use these sums to find a linear function of the form y = Mx + B, where M and B are constants.
Find M
Multiply the number of points in your data set by the sum of the xy column. If the sum of the xy column is 200, for example, and the number of data points is 10, the result would be 2000.
Multiply the sum of the x column by the sum of the y column. If the sum of the x column is 20 and the sum of the y column is 100, your answer would be 2000.
Subtract the result in Step 2 from the result in Step 1. In the example your result would be 0.
Multiply the number of data points in your data set by the sum of the x-squared column. If your number of data points is 10 and the sum of your x-squared column is 60, your answer would be 600.
Square the sum of the x column and subtract it from your result in Step 4. If the sum of the x column is 20, 20 squared would be 400, so 600 - 400 is 200.
Divide your result from Step 3 by your result from Step 5. In the example, the outcome would be 0, since 0 divided by any number is 0. M = 0.
Find B and Solve the Equation
- Pencil
- Paper
- Calculator
Are you curious to know how the formula you just used is derived? It's not actually as difficult as you might think, although it does involve some calculus (partial derivatives). The first link under the References section will give you some insight if you are interested.
Many graphing calculators and spreadsheet programs are designed to automatically calculate linear regression formulas for you, although the steps you'll need to get your spreadsheet program/graphing calculator to execute this operation will depend on the model/brand. Consult the user's manual for instructions.
Note that the formula you have derived is a line of best fit. That does not mean that it will pass through every single data point -- in fact, it is unlikely that it will. It will, however, be the best possible linear equation for the data set you used.
Multiply the sum of the x-squared column by the sum of the y column. In the example, the sum of the x-squared column is 60 and the sum of the y column is 100, so 60 x 100 = 6000.
Multiply the sum of the x column by the sum of the xy column. If the sum of the x column is 20 and the sum of the xy column is 200 then 20 x 200 = 4000.
Subtract your answer in Step 2 from your answer in Step 1: 6000 - 4000 = 2000.
Multiply the number of data points in your data set by the sum of the x-squared column. If your number of data points is 10 and the sum of your x-squared column is 60, your answer would be 600.
Square the sum of the x column and subtract it from your result in Step 4. If the sum of the x column is 20, then 20 squared would be 400, so 600 - 400 is 200.
Divide your result from Step 3 by your result from Step 5. In this example, 2000 / 200 would be 10, so you now know that B is 10.
Write out the linear equation you have derived by using the form y = Mx + B. Plug in the values you have calculated for M and B. In the example, M = 0 and B = 10, so y = 0x + 10 or y = 10.
Things You'll Need
Tips
Warnings
References
- University of Minnesota Department of Psychology: The Mathematical Derivation of Least Squares
- "An Introduction to Error Analysis"; John R. Taylor; 1997
Tips
- Are you curious to know how the formula you just used is derived? It's not actually as difficult as you might think, although it does involve some calculus (partial derivatives). The first link under the References section will give you some insight if you are interested.
- Many graphing calculators and spreadsheet programs are designed to automatically calculate linear regression formulas for you, although the steps you'll need to get your spreadsheet program/graphing calculator to execute this operation will depend on the model/brand. Consult the user's manual for instructions.
Warnings
- Note that the formula you have derived is a line of best fit. That does not mean that it will pass through every single data point -- in fact, it is unlikely that it will. It will, however, be the best possible linear equation for the data set you used.
About the Author
Based in San Diego, John Brennan has been writing about science and the environment since 2006. His articles have appeared in "Plenty," "San Diego Reader," "Santa Barbara Independent" and "East Bay Monthly." Brennan holds a Bachelor of Science in biology from the University of California, San Diego.
Photo Credits
Jupiterimages/Photos.com/Getty Images