What Is Multicollinearity?
Multicollinearity is the occurrence of high intercorrelations among two or more independent variables in a multiple regression model. Multicollinearity can lead to skewed or misleading results when a researcher or analyst attempts to determine how well each independent variable can be used most effectively to predict or understand the dependent variable in a statistical model.
In general, multicollinearity can lead to wider confidence intervals that produce less reliable probabilities in terms of the effect of independent variables in a model.
- Multicollinearity is a statistical concept where several independent variables in a model are correlated.
- Two variables are considered to be perfectly collinear if their correlation coefficient is +/- 1.0.
- Multicollinearity among independent variables will result in less reliable statistical inferences.
- It is better to use independent variables that are not correlated or repetitive when building multiple regression models that use two or more variables.
- The existence of multicollinearity in a data set can lead to less reliable results due to larger standard errors.
Statistical analysts use multiple regression models to predict the value of a specified dependent variable based on the values of two or more independent variables. The dependent variable is sometimes referred to as the outcome, target, or criterion variable.
An example is a multivariate regression model that attempts to anticipate stock returns based on items such as price-to-earnings ratios (P/E ratios), market capitalization, past performance, or other data. The stock return is the dependent variable and the various bits of financial data are the independent variables.
Multicollinearity in a multiple regression model indicates that collinear independent variables are related in some fashion, although the relationship may or may not be casual. For example, past performance might be related to market capitalization, as stocks that have performed well in the past will have increasing market values.
In other words, multicollinearity can exist when two independent variables are highly correlated. It can also happen if an independent variable is computed from other variables in the data set or if two independent variables provide similar and repetitive results.
One of the most common ways of eliminating the problem of multicollinearity is to first identify collinear independent variables and then remove all but one.
It is also possible to eliminate multicollinearity by combining two or more collinear variables into a single variable. Statistical analysis can then be conducted to study the relationship between the specified dependent variable and only a single independent variable.
The statistical inferences from a model that contains multicollinearity may not be dependable.
Examples of Multicollinearity
Market analysts want to avoid using technical indicators that are collinear in that they are based on very similar or related inputs; they tend to reveal similar predictions regarding the dependent variable of price movement. Instead, the market analysis must be based on markedly different independent variables to ensure that they analyze the market from different independent analytical viewpoints.
An example of a potential multicollinearity problem is performing technical analysis only using several similar indicators.
Noted technical analyst John Bollinger, creator of the Bollinger Bands indicator, notes that "a cardinal rule for the successful use of technical analysis requires avoiding multicollinearity amid indicators." To solve the problem, analysts avoid using two or more technical indicators of the same type. Instead, they analyze a security using one type of indicator, such as a momentum indicator, and then do a separate analysis using a different type of indicator, such as a trend indicator.
For example, stochastics, the relative strength index (RSI), and Williams %R are all momentum indicators that rely on similar inputs and are likely to produce similar results. In this case, it is better to remove all but one of the indicators or find a way to merge several of them into just one indicator, while also adding a trend indicator that is not likely to be highly correlated with the momentum indicator.
Multicollinearity is also observed in many other contexts. One such context is human biology. For example, an individual's blood pressure is not collinear with age, but also weight, stress, and pulse.
How Do You Detect Multicollinearity?
A statistical technique called the variance inflation factor (VIF) is used to detect and measure the amount of collinearity in a multiple regression model.
How Can One Deal With Multicollinearity?
To reduce the amount of multicollinearity found in a model, one can remove the specific variables that are identified as the most collinear. You can also try to combine or transform the offending variables to lower their correlation. If that does not work or is unattainable, there are modified regression models that better deal with multicollinearity, such as the ridge regression, principal component regression, or partial least squares regression.
What Is Perfect Collinearity?
Perfect collinearity exists when there is an exact 1:1 correspondence between two independent variables in a model. This can be either a correlation of +1.0 or -1.0.