# Residual in Statistics

Print

When you build models in statistics, you will usually test them, making sure the models match real-world situations. The residual is a number that helps you determine how close your theorized model is to the phenomenon in the real world. Residuals are not too hard to understand: They are just numbers that represent how far away a data point is from what it “should be” according to the predicted model.

## Mathematical Definition

Mathematically, a residual is the difference between an observed data point and the expected -- or estimated -- value for what that data point should have been. The formula for a residual is R = O - E, where “O” means the observed value and “E” means the expected value. This means that positive values of R show values higher than expected, whereas negative values show values lower than expected. For example, you might have a statistical model that says when a man’s weight is 140 pounds, his height should be 6 feet, or 72 inches. When you go out and collect data, you might find someone who weighs 140 pounds but is 5 feet 9 inches, or 69 inches. The residual is then 69 inches minus 72 inches, giving you a value of negative 3 inches. In other words, the observed data point is 3 inches below the expected value.

## Checking Models

Residuals are especially useful when you want to check if your theorized model works in the real world. When you create a model and calculate its expected values, you are theorizing. But when you go collect data, you might find that the data don't match the model. One way to find this mismatch between your model and the real world is to calculate residuals. For example, if you find that your residuals are all consistently far away from your estimated values, your model might not have a strong underlying theory. An easy way to use residuals in this way is to plot them.

## Plotting Residuals

When you calculate the residuals, you have a handful of numbers, which is hard for humans to interpret. Plotting the residuals can often show you patterns. These patterns can lead you to determine whether the model is a good fit. Two aspects of residuals can help you analyze a plot of residuals. First, residuals for a good model should be scattered on both sides of zero. That is, a plot of residuals should have about the same amount of negative residuals as positive residuals. Second, residuals should appear to be random. If you see a pattern in your residual plot, such as them having a clear linear or curved pattern, your original model could have an error.