What Is a Variance Inflation Factor (VIF)?
A variance inflation factor (VIF) is a measure of the amount of multicollinearity in a set of multiple regression variables. Mathematically, the VIF for a regression model variable is equal to the ratio of the overall model variance to the variance of a model that includes only that single independent variable. This ratio is calculated for each independent variable. A high VIF indicates that the associated independent variable is highly collinear with the other variables in the model.
- A variance inflation factor (VIF) provides a measure of multicollinearity among the independent variables in a multiple regression model.
- Detecting multicollinearity is important because while multicollinearity does not reduce the explanatory power of the model, it does reduce the statistical significance of the independent variables.
- A large variance inflation factor (VIF) on an independent variable indicates a highly collinear relationship to the other variables that should be considered or adjusted for in the structure of the model and selection of independent variables.
Understanding a Variance Inflation Factor (VIF)
A variance inflation factor is a tool to help identify the degree of multicollinearity. Multiple regression is used when a person wants to test the effect of multiple variables on a particular outcome. The dependent variable is the outcome that is being acted upon by the independent variables—the inputs into the model. Multicollinearity exists when there is a linear relationship, or correlation, between one or more of the independent variables or inputs.
The Problem of Multicollinearity
Multicollinearity creates a problem in the multiple regression model because the inputs are all influencing each other. Therefore, they are not actually independent, and it is difficult to test how much the combination of the independent variables affects the dependent variable, or outcome, within the regression model.
While multicollinearity does not reduce a model's overall predictive power, it can produce estimates of the regression coefficients that are not statistically significant. In a sense, it can be thought of as a kind of double-counting in the model.
In statistical terms, a multiple regression model where there is high multicollinearity will make it more difficult to estimate the relationship between each of the independent variables and the dependent variable. In other words, when two or more independent variables are closely related or measure almost the same thing, then the underlying effect that they measure is being accounted for twice (or more) across the variables. It becomes difficult or impossible to say which variable is really influencing the independent variable.
Small changes in the data used or in the structure of the model equation can produce large and erratic changes in the estimated coefficients on the independent variables. This is a problem because the goal of many econometric models is to test exactly this sort of statistical relationship between the independent variables and the dependent variable.
Tests to Solve Multicollinearity
To ensure the model is properly specified and functioning correctly, there are tests that can be run for multicollinearity. The variance inflation factor is one such measuring tool. Using variance inflation factors helps to identify the severity of any multicollinearity issues so that the model can be adjusted. Variance inflation factor measures how much the behavior (variance) of an independent variable is influenced, or inflated, by its interaction/correlation with the other independent variables.
Variance inflation factors allow a quick measure of how much a variable is contributing to the standard error in the regression. When significant multicollinearity issues exist, the variance inflation factor will be very large for the variables involved. After these variables are identified, several approaches can be used to eliminate or combine collinear variables, resolving the multicollinearity issue.
Formula and Calculation of VIF
The formula for VIF is:
Where Ri2 represents the unadjusted coefficient of determination for regressing the ith independent variable on the remaining ones.
What Can VIF Tell You?
When Ri2 is equal to 0, and therefore, when VIF or tolerance is equal to 1, the ith independent variable is not correlated to the remaining ones, meaning that multicollinearity does not exist.
In general terms,
- VIF equal to 1 = variables are not correlated
- VIF between 1 and 5 = variables are moderately correlated
- VIF greater than 5 = variables are highly correlated
The higher the VIF, the higher the possibility that multicollinearity exists, and further research is required. When VIF is higher than 10, there is significant multicollinearity that needs to be corrected.
Example of Using VIF
For example, suppose that an economist wants to test whether there is a statistically significant relationship between the unemployment rate (independent variable) and the inflation rate (dependent variable). Including additional independent variables that are related to the unemployment rate, such as new initial jobless claims, would be likely to introduce multicollinearity into the model.
The overall model might show strong, statistically sufficient explanatory power, but be unable to identify if the effect is mostly due to the unemployment rate or to the new initial jobless claims. This is what the VIF would detect, and it would suggest possibly dropping one of the variables out of the model or finding some way to consolidate them to capture their joint effect depending on what specific hypothesis the researcher is interested in testing.
What Is a Good VIF Value?
As a rule of thumb, a VIF of three or below is not a cause for concern. As VIF increases, the less reliable your regression results are going to be.
What Does a VIF of 1 Mean?
A VIF equal to one means variables are not correlated and multicollinearity does not exist in the regression model.
What Is VIF Used for?
VIF measures the strength of the correlation between the independent variables in regression analysis. This correlation is known as multicollinearity, which can cause problems for regression models.
The Bottom Line
While a moderate amount of multicollinearity is acceptable in a regression model, a higher multicollinearity can be a cause for concern.
Two measures can be taken to correct high multicollinearity, First, one or more of the highly correlated variables can be removed, as the information provided by these variables is redundant. The second method is to use principal components analysis or partial least square regression instead of OLS regression, which can respectively reduce the variables to a smaller set with no correlation, or create new uncorrelated variables. This will improve the predictability of a model.