Statistics made easy
8th revised edition (March 2026) - many illustrative examples - only €8.99
Multicollinearity
Author: Dr. Hannah Volk-Jesussek
Updated:
In a regression analysis, multicollinearity occurs when two or more predictor variables (independent variables) show a high correlation. This can lead to the regression coefficients being unstable and no longer interpretable.
Why is Multicollinearity a Problem?
Multicollinearity is a problem because it distorts the statistical significance of the independent variable.
A main goal of regression is to determine the relationship of each independent variable and the dependent variable. However, when variables are highly correlated, it may no longer be possible to determine exactly which influence comes from which variable. Thus, the p-values of the regression coefficients can no longer be interpreted.
With multicollinearity, the regression coefficients can vary greatly when the data change very slightly or new variables are added.
Is Multicollinearity always a Problem?
Multicollinearity only affects the independent variables that are highly correlated. If you are interested in other variables that do not exhibit multicollinearity, then you can interpret them normally.
If you are using the regression model to make a prediction, then multicollinearity does not affect the outcome of the prediction. It only affects the individual coefficients and the p-value.
How to avoid Multicollinearity?
To avoid multicollinearity, there must be no linear dependence between the predictors; this is the case, for example, when one variable is a multiple of another variable. In this case, since the variables are perfectly correlated, one variable explains 100% of the other variable and there is no added value in taking both variables in a regression model. If there is no correlation between the independent variables, then there is no multicollinearity.
In reality, a perfect linear correlation hardly ever occurs, which is why we speak of multicollinearity when individual variables are highly correlated with each other. In this case the effect of individual variables cannot be clearly separated from each other.
Note that the regression coefficients can no longer be interpreted in a meaningful way, but prediction with the regression model is still possible.
Multicollinearity Test
Since there is always some multicollinearity in a given set of data, ratios were introduced to indicate multicollinearity. To test for multicollinearity, a new regression model is created for each independent variable. In these regression models, the original dependent variable is left out and one of the independent variables is made the dependent variable in each case.
Thus, it tests how well one independent variable can be represented by the other independent variables. If one independent variable can be very well represented by the others, this is a sign of multicollinearity.
For example, if x1 can be completely composed of the other variables, then the regression model cannot know what b1 is or what the other coefficients must be. In mathematics we say that the equation is overdetermined.
Tolerance value
To find out whether multicollinearity is present, the tolerance of the individual predictors is considered. The tolerance Ti for the ith predictor is calculated with
To calculate Ri2, a new regression model is created, as discussed above. This model contains all predictors, with the ith predictor used as a new criterion (dependent variable). This makes it possible to determine how well the ith predictor can be represented by the other predictors.
A tolerance value (T) below 0.1 is considered critical and multicollinearity is present. In this case, more than 90% of the variance can be explained by the other predictors.
VIF Multicollinearity
Another measure used to test for multicollinearity is the VIF (Variance Inflation Factor). The VIF statistic is calculated by
The higher the VIF value, the more likely multicollinearity is present. In the VIF test, values above 10 are considered critical. The VIF value therefore increases with increasing multicollinearity.
Statistics made easy
- many illustrative examples
- ideal for exams and theses
- statistics made easy on 464 pages
- 8th revised edition (March 2026)
Only €8.99
Free sample
"Super simple written"
"It could not be simpler"
"So many helpful examples"