Multicollinearity: Meaning, Examples and FAQ

What Is Multicollinearity?

Multicollinearity is the occurrence of high intercorrelations among two or more independent variables in a multiple regression model. Multicollinearity can lead to skewed or misleading results when a researcher or analyst attempts to determine how well each independent variable can be used most effectively to predict or understand the dependent variable in a statistical model.

In general, multicollinearity can lead to wider confidence intervals that produce less reliable probabilities in terms of the effect of independent variables in a model.

Key Takeaways

  • Multicollinearity is a statistical concept where several independent variables in a model are correlated.
  • Two variables are considered to be perfectly collinear if their correlation coefficient is +/- 1.0.
  • Multicollinearity among independent variables will result in less reliable statistical inferences.
  • It is better to use independent variables that are not correlated or repetitive when building multiple regression models that use two or more variables.
  • The existence of multicollinearity in a data set can lead to less reliable results due to larger standard errors.

Understanding Multicollinearity

Statistical analysts use multiple regression models to predict the value of a specified dependent variable based on the values of two or more independent variables. The dependent variable is sometimes referred to as the outcome, target, or criterion variable.

An example is a multivariate regression model that attempts to anticipate stock returns based on items such as price-to-earnings ratios (P/E ratios), market capitalization, past performance, or other data. The stock return is the dependent variable and the various bits of financial data are the independent variables.

Multicollinearity in a multiple regression model indicates that collinear independent variables are related in some fashion, although the relationship may or may not be casual. For example, past performance might be related to market capitalization, as stocks that have performed well in the past will have increasing market values.

Causes of Multicollinearity

Multicollinearity can exist when two independent variables are highly correlated. It can also happen if an independent variable is computed from other variables in the data set or if two independent variables provide similar and repetitive results.

Effects of Multicollinearity

Although multicollinearity does not affect the regression estimates, it makes them vague, imprecise, and unreliable. Thus, it can be hard to determine how the independent variables influence the dependent variable individually. This inflates the standard errors of some or all of the regression coefficients.

Detecting Multicollinearity

A statistical technique called the variance inflation factor (VIF) can be used to detect and measure the amount of collinearity in a multiple regression model. VIF measures how much the variance of the estimated regression coefficients are inflated as compared to when the predictor variables are not linearly related. A VIF of 1 will mean that the variables are not correlated; a VIF between 1 and 5 shows that variables are moderately correlated, and a VIF between 5 and 10 will mean that variables are highly correlated.

Eliminating Multicollinearity

One of the most common ways of eliminating the problem of multicollinearity is to first identify collinear independent variables and then remove all but one.

It is also possible to eliminate multicollinearity by combining two or more collinear variables into a single variable. Statistical analysis can then be conducted to study the relationship between the specified dependent variable and only a single independent variable.

The statistical inferences from a model that contains multicollinearity may not be dependable.

Examples of Multicollinearity

In Investing

For investing, multicollinearity is a common consideration when performing technical analysis to predict probable future price movements of a security, such as a stock or a commodity future.

Market analysts want to avoid using technical indicators that are collinear in that they are based on very similar or related inputs; they tend to reveal similar predictions regarding the dependent variable of price movement. Instead, the market analysis must be based on markedly different independent variables to ensure that they analyze the market from different independent analytical viewpoints.

An example of a potential multicollinearity problem is performing technical analysis only using several similar indicators.

Noted technical analyst John Bollinger, creator of the Bollinger Bands indicator, notes that "a cardinal rule for the successful use of technical analysis requires avoiding multicollinearity amid indicators." To solve the problem, analysts avoid using two or more technical indicators of the same type. Instead, they analyze a security using one type of indicator, such as a momentum indicator, and then do a separate analysis using a different type of indicator, such as a trend indicator.

For example, stochastics, the relative strength index (RSI), and Williams %R are all momentum indicators that rely on similar inputs and are likely to produce similar results. In this case, it is better to remove all but one of the indicators or find a way to merge several of them into just one indicator, while also adding a trend indicator that is not likely to be highly correlated with the momentum indicator.

In Biology

Multicollinearity is also observed in many other contexts. One such context is human biology. For example, an individual's blood pressure is not collinear with age, but also with weight, stress, and pulse.

How Can One Deal With Multicollinearity?

To reduce the amount of multicollinearity found in a model, one can remove the specific variables that are identified as the most collinear. You can also try to combine or transform the offending variables to lower their correlation. If that does not work or is unattainable, there are modified regression models that better deal with multicollinearity, such as the ridge regression, principal component regression, or partial least squares regression.

What Is Perfect Collinearity?

Perfect collinearity exists when there is an exact 1:1 correspondence between two independent variables in a model. This can be either a correlation of +1.0 or -1.0.

Why Is Multicollinearity a Problem?

Multicollinearity is a problem because it produces regression model results that are less reliable. This is due to wider confidence intervals (larger standard errors) that can lower the statistical significance of regression coefficients.

The Bottom Line

Multicollinearity exists whenever an independent variable is highly correlated with one or more of the other independent variables in a multiple regression equation. Multicollinearity is a problem because it will make the statistical inferences less reliable. However, the Variance Inflation Factor (VIF) can provide information about which variable or variables are redundant, and thus the variables that have a high VIF can be removed.

Take the Next Step to Invest
The offers that appear in this table are from partnerships from which Investopedia receives compensation. This compensation may impact how and where listings appear. Investopedia does not include all offers available in the marketplace.