What Is Multicollinearity?
Multicollinearity is the occurrence of high intercorrelations among independent variables in a multiple regression model. Multicollinearity can lead to skewed or misleading results when a researcher or analyst attempts to determine how well each independent variable can be used most effectively to predict or understand the dependent variable in a statistical model. In general, multicollinearity can lead to wider confidence intervals and less reliable probability values for the independent variables. That is, the statistical inferences from a model with multicollinearity may not be dependable.
Statistical analysts use multiple regression models to predict the value of a specified dependent variable based on the values of two or more independent variables. The dependent variable is sometimes referred to as the outcome, target, or criterion variable. An example is a multivariate regression model that attempts to anticipate stock returns based on items like price-to-earnings ratios, market capitalization, past performance, or other data. The stock return is the dependent variable and the various bits of financial data are the independent variables.
- Multicollinearity is a statistical concept where independent variables in a model are correlated.
- Multicollinearity among independent variables will result in less reliable statistical inferences.
- It is better to use independent variables that are not correlated or repetitive when building multiple regression models that use two or more variables.
Multicollinearity in a multiple regression model indicates that collinear independent variables are related in some fashion, although the relationship may or may not be casual. For example, past performance might be related to market capitalization, as stocks that have performed well in the past will have increasing market values. In other words, multicollinearity can exist when two independent variables are highly correlated. It can also happen if an independent variable is computed from other variables in the data set or if two independent variables provide similar and repetitive results.
One of the most common ways of eliminating the problem of multicollinearity is to first identify collinear independent variables and then remove all but one. It is also possible to eliminate multicollinearity by combining two or more collinear variables into a single variable. Statistical analysis can then be conducted to study the relationship between the specified dependent variable and only a single independent variable.
Example of Multicollinearity
For investing, multicollinearity is a common consideration when performing technical analysis to predict probable future price movements of a security, such as a stock or a commodity future. Market analysts want to avoid using technical indicators that are collinear in that they are based on very similar or related inputs; they tend to reveal similar predictions regarding the dependent variable of price movement. Instead, market analysis must be based on markedly different independent variables to ensure that they analyze the market from different independent analytical viewpoints.
Noted technical analyst John Bollinger, creator of the Bollinger Bands indicator, notes that "a cardinal rule for the successful use of technical analysis requires avoiding multicollinearity amid indicators."
To solve the problem, analysts avoid using two or more technical indicators of the same type. Instead, they analyze a security using one type of indicator, such as a momentum indicator and then do separate analysis using a different type of indicator, such as a trend indicator.
An example of a potential multicollinearity problem is performing technical analysis only using several similar indicators, such as stochastics, the relative strength index (RSI), and Williams %R, which are all momentum indicators that rely on similar inputs and are likely to produce similar results. In this case, it is better to remove all but one of the indicators or find a way to merge several of them into just one indicator, while also adding a trend indicator that is not likely to be highly correlated with the momentum indicator.