One problem that can arise in multiple regression analysis is multicollinearity. Multicollinearity is when two or more of the independent variables of a multiple regression model are highly correlated. Technically, if two of the independent variables are correlated, we have collinearity; when three or more independent variables are correlated, we have multicollinearity. However, the two terms are frequently used interchangeably. The reality of business research is that most of the time some correlation between predictors (independent variables) will be present. The problem of multicollinearity arises when the inter-correlation between predictor variables is high. This relationship causes several other problems, particularly in the interpretation of the analysis.
1.It is difficult, if not impossible, to interpret the estimates of the regression coefficients. 2.Inordinately small t values for the regression coefficients may result. 3.The standard deviations of regression coefficients are overestimated. 4.The algebraic sign of estimated regression coefficients may be the opposite of what would be expected for a particular predictor variable.
The problem of multicollinearity can arise in regression analysis in a variety of business research situations. For example, suppose a model is being developed to predict salaries in a given industry. Independent variables such as years of education, age, years in management, experience on the job, and years of tenure with the firm might be considered as predictors. It is obvious that several of these variables are correlated (virtually all of these variables have something to do with number of years, or time) and yield redundant information. Suppose a financial regression model is being developed to predict bond market rates by such independent variables as Dow Jones average, prime interest rates, GNP, producer price index, and consumer price index.
Several of these predictors are likely to be inter-correlated. The problem of multicollinearity can also affect the t values that are used to evaluate the regression coefficients. Because the problems of multicollinearity among predictors can result in an overestimation of the standard deviation of the regression coefficients, the t values tend to be under representative when multicollinearity is present. In some regression models containing multicollinearity in which all t values are non-significant, the overall F value for the model is highly significant.
Many of the problems created by multicollinearity are interpretation problems. The business researcher should be alert to and aware of multicollinearity potential with the predictors in the model and view the model outcome in light of such potential. The problem of multicollinearity is not a simple one to overcome. However, several methods offer an approach to the problem. Stepwise regression is one of the ways to prevent the problem of multicollinearity. The search process enters the variables one at a time and compares the new variable to those in solution.
If a new variable is entered and the t values on old variables become non-significant, the old variables are dropped out of solution. In this manner, it is more difficult for the problem of multicollinearity to affect the regression analysis. Of course, because of multicollinearity, some important predictors may not enter in to the analysis. Other techniques are available to attempt to control for the problem of multicollinearity. One is called a variance inflation factor, in which a regression analysis is conducted to predict an independent variable by the other independent variables.