When a set of data contains two variables that may relate, such as the heights and weights of individuals, regression analysis finds a mathematical function that best approximates the relationship. The sum of residuals is a measure of how good a job the function does.
Residuals
In regression analysis, we choose one variable to be the “explanatory variable,” which we will call x, and the other to be the “response variable” which we will call y. Regression analysis creates the function y = f(x) that best predicts the response variable from its associated explanatory variable. If x[i] is one of the explanatory variables, and y[i] its response variable, then the residual is the error, or difference between the actual value of y[i] and the predicted value of y[i]. In other words, residual = y[i] - f(x[i]).
Example
A set of data contains the heights in centimeters and weights in kilograms of 5 people: [(152,54), (165,65), (175,100), (170,80), (140, 45)]. A quadratic fit of weight, w, for height, h, is w = f(h) = 1160 -15.5_h + 0.054_h^2. The residuals are (in kg): [2.38, 7.65, 1.25, 5.60, 3.40]. The sum of residuals is 15.5 kg.
Linear Regression
The simplest kind of regression is linear regression, in which the mathematical function is a straight line of the form y = m*x + b. In this case, the sum of residuals is 0 by definition.
References
About the Author
Ariel Balter started out writing, editing and typesetting, changed gears for a stint in the building trades, then returned to school and earned a PhD in physics. Since that time, Balter has been a professional scientist and teacher. He has a vast area of expertise including cooking, organic gardening, green living, green building trades and many areas of science and technology.
Photo Credits
DragonImages/iStock/Getty Images