When a set of data contains two variables that may relate, such as the heights and weights of individuals, regression analysis finds a mathematical function that best approximates the relationship. The sum of residuals is a measure of how good a job the function does.
In regression analysis, we choose one variable to be the “explanatory variable,” which we will call x, and the other to be the “response variable” which we will call y. Regression analysis creates the function y = f(x) that best predicts the response variable from its associated explanatory variable. If x[i] is one of the explanatory variables, and y[i] its response variable, then the residual is the error, or difference between the actual value of y[i] and the predicted value of y[i]. In other words, residual = y[i] - f(x[i]).
A set of data contains the heights in centimeters and weights in kilograms of 5 people: [(152,54), (165,65), (175,100), (170,80), (140, 45)]. A quadratic fit of weight, w, for height, h, is w = f(h) = 1160 -15.5_h + 0.054_h^2. The residuals are (in kg): [2.38, 7.65, 1.25, 5.60, 3.40]. The sum of residuals is 15.5 kg.
The simplest kind of regression is linear regression, in which the mathematical function is a straight line of the form y = m*x + b. In this case, the sum of residuals is 0 by definition.
- DragonImages/iStock/Getty Images