The least squares regression line (LSRL) is a line that serves as a prediction function for a phenomenon that is not well-known. The mathematical statistics definition of a least squares regression line is the line that passes through the point (0,0) and has a slope equal to the correlation coefficient of the data, after the data has been standardized. Thus, calculating the least squares regression line involves standardizing the data and finding the correlation coefficient.
Find the Correlation Coefficient
Arrange your data so that it is easy to work with. Use a spreadsheet or matrix to separate your data into its x-values and y-values, keeping them linked (i.e. make sure each data point’s x-value and y-value are in the same row or column).
Find the cross products of the x-values and y-values. Multiply the x-value and y-value for each point together. Sum these resulting values. Call the result “sxy.”
Sum the x-values and y-values separately. Call these two resulting values “sx” and “sy,” respectively.
Count the number of data points. Call this value “n.”
Take the sum of squares for your data. Square all of your values. Multiply every x-value and every y-value by itself. Call the new sets of data “x2” and “y2” for the x-values and y-values. Sum all of the x2 values and call the result “sx2.” Sum all of the y2 values and call the result “sy2.”
Subtract sx*sy/n from sxy. Call the result “num.”
Compute the value sx2-(sx^2)/n. Call the result “A.”
Compute the value sy2-(sy^2)/n. Call the result “B.”
Take the square root of A times B, which can be shown as (A*B)^(1/2). Label the result “denom.”
Calculate the correlation coefficient, “r.” The value of “r” equals “num” divided by “denom,” which can be written as num / denom.
Standardize the Data and Write the LSRL
Find the means of the x-values and y-values. Add all of the x-values together and divide the result by “n.” Call this “mx.” Do the same for the y-values, calling the result “my.”
Find the standard deviations for the x-values and y-values. Create new sets of data for the x’s and y’s by subtracting the mean for each data set from its associated data. For example, every data point for x, “xdat” will become “xdat - mx.” Square the resulting data points. Add the results for each group (x and y) separately, dividing by “n” for each group. Take the square root of these two final results to yield the standard deviation for each group. Call the standard deviation for the x-values “sdx” and that for the y-values “sdy.”
Standardize the data. Subtract the mean for the x-values from every x-value. Divide the results by “sdx.” The remaining data are standardized. Call this data “x_”. Do the same for the y-values: subtract “my” from every y-value, dividing by “sdy” as you go along. Call this data “y_”.
Write the regression line. Write “y_^ = rx_”, where "^" is representative of "hat" -- a predicted value -- and “r” is equal to the correlation coefficient found earlier.
- Jupiterimages/Polka Dot/Getty Images