Multiple regression is used to examine the relationship between several independent variables and a dependent variable. While multiple regression models allow you to analyze the relative influences of these independent, or predictor, variables on the dependent, or criterion, variable, these often complex data sets can lead to false conclusions if they aren't analyzed properly.
Examples of Multiple Regression
A real estate agent could use multiple regression to analyze the value of houses. For example, she could use as independent variables the size of the houses, their ages, the number of bedrooms, the average home price in the neighborhood and the proximity to schools. Plotting these in a multiple regression model, she could then use these factors to see their relationship to the prices of the homes as the criterion variable.
Another example of using a multiple regression model could be someone in human resources determining the salary of management positions – the criterion variable. The predictor variables could be each manager's seniority, the average number of hours worked, the number of people being managed and the manager's departmental budget.
Advantages of Multiple Regression
There are two main advantages to analyzing data using a multiple regression model. The first is the ability to determine the relative influence of one or more predictor variables to the criterion value. The real estate agent could find that the size of the homes and the number of bedrooms have a strong correlation to the price of a home, while the proximity to schools has no correlation at all, or even a negative correlation if it is primarily a retirement community.
The second advantage is the ability to identify outliers, or anomalies. For example, while reviewing the data related to management salaries, the human resources manager could find that the number of hours worked, the department size and its budget all had a strong correlation to salaries, while seniority did not. Alternatively, it could be that all of the listed predictor values were correlated to each of the salaries being examined, except for one manager who was being overpaid compared to the others.
Disadvantages of Multiple Regression
Any disadvantage of using a multiple regression model usually comes down to the data being used. Two examples of this are using incomplete data and falsely concluding that a correlation is a causation.
When reviewing the price of homes, for example, suppose the real estate agent looked at only 10 homes, seven of which were purchased by young parents. In this case, the relationship between the proximity of schools may lead her to believe that this had an effect on the sale price for all homes being sold in the community. This illustrates the pitfalls of incomplete data. Had she used a larger sample, she could have found that, out of 100 homes sold, only ten percent of the home values were related to a school's proximity. If she had used the buyers' ages as a predictor value, she could have found that younger buyers were willing to pay more for homes in the community than older buyers.
In the example of management salaries, suppose there was one outlier who had a smaller budget, less seniority and with fewer personnel to manage but was making more than anyone else. The HR manager could look at the data and conclude that this individual is being overpaid. However, this conclusion would be erroneous if he didn't take into account that this manager was in charge of the company's website and had a highly coveted skillset in network security.