What Is Heteroskedasticity?
In statistics, heteroskedasticity (or heteroscedasticity) happens when the standard deviations of a predicted variable, monitored over different values of an independent variable or as related to prior time periods, are non-constant. With heteroskedasticity, the tell-tale sign upon visual inspection of the residual errors is that they will tend to fan out over time, as depicted in the image below.
Heteroskedasticity often arises in two forms: conditional and unconditional. Conditional heteroskedasticity identifies nonconstant volatility related to prior period's (e.g., daily) volatility. Unconditional heteroskedasticity refers to general structural changes in volatility that are not related to prior period volatility. Unconditional heteroskedasticity is used when future periods of high and low volatility can be identified.
- In statistics, heteroskedasticity (or heteroscedasticity) happens when the standard errors of a variable, monitored over a specific amount of time, are non-constant.
- With heteroskedasticity, the tell-tale sign upon visual inspection of the residual errors is that they will tend to fan out over time, as depicted in the image above.
- Heteroskedasticity is a violation of the assumptions for linear regression modeling, and so it can impact the validity of econometric analysis or financial models like CAPM.
While heteroskedasticity does not cause bias in the coefficient estimates, it does make them less precise; lower precision increases the likelihood that the coefficient estimates are further from the correct population value.
The Basics of Heteroskedasticity
In finance, conditional heteroskedasticity is often seen in the prices of stocks and bonds. The level of volatility of these equities cannot be predicted over any period. Unconditional heteroskedasticity can be used when discussing variables that have identifiable seasonal variability, such as electricity usage.
As it relates to statistics, heteroskedasticity (also spelled heteroscedasticity) refers to the error variance, or dependence of scattering, within a minimum of one independent variable within a particular sample. These variations can be used to calculate the margin of error between data sets, such as expected results and actual results, as it provides a measure of the deviation of data points from the mean value.
For a dataset to be considered relevant, the majority of the data points must be within a particular number of standard deviations from the mean as described by Chebyshev’s theorem, also known as Chebyshev’s inequality. This provides guidelines regarding the probability of a random variable differing from the mean.
Based on the number of standard deviations specified, a random variable has a particular probability of existing within those points. For example, it may be required that a range of two standard deviations contain at least 75% of the data points to be considered valid. A common cause of variances outside the minimum requirement is often attributed to issues of data quality.
The opposite of heteroskedastic is homoskedastic. Homoskedasticity refers to a condition in which the variance of the residual term is constant or nearly so. Homoskedasticity is one assumption of linear regression modeling. It is needed to ensure that the estimates are accurate, that the prediction limits for the dependent variable are valid, and that confidence intervals and p-values for the parameters are valid.
The Types Heteroskedasticity
Unconditional heteroskedasticity is predictable and can relate to variables that are cyclical by nature. This can include higher retail sales reported during the traditional holiday shopping period or the increase in air conditioner repair calls during warmer months.
Changes within the variance can be tied directly to the occurrence of particular events or predictive markers if the shifts are not traditionally seasonal. This can be related to an increase in smartphone sales with the release of a new model as the activity is cyclical based on the event but not necessarily determined by the season.
Heteroskedasticity can also relate to cases where the data approach a boundary—where the variance must necessarily be smaller because of the boundary's restricting the range of the data.
Conditional heteroskedasticity is not predictable by nature. There is no telltale sign that leads analysts to believe data will become more or less scattered at any point in time. Often, financial products are considered subject to conditional heteroskedasticity as not all changes can be attributed to specific events or seasonal changes.
A common application of conditional heteroskedasticity is to stock markets, where the volatility today is strongly related to volatility yesterday. This model explains periods of persistent high volatility and low volatility.
Heteroskedasticity and Financial Modeling
Heteroskedasticity is an important concept in regression modeling, and in the investment world, regression models are used to explain the performance of securities and investment portfolios. The most well-known of these is the Capital Asset Pricing Model (CAPM), which explains the performance of a stock in terms of its volatility relative to the market as a whole. Extensions of this model have added other predictor variables such as size, momentum, quality, and style (value versus growth).
These predictor variables have been added because they explain or account for variance in the dependent variable. Portfolio performance is explained by CAPM. For example, developers of the CAPM model were aware that their model failed to explain an interesting anomaly: high-quality stocks, which were less volatile than low-quality stocks, tended to perform better than the CAPM model predicted. CAPM says that higher-risk stocks should outperform lower-risk stocks.
In other words, high-volatility stocks should beat lower-volatility stocks. But high-quality stocks, which are less volatile, tended to perform better than predicted by CAPM.
Later, other researchers extended the CAPM model (which had already been extended to include other predictor variables such as size, style, and momentum) to include quality as an additional predictor variable, also known as a "factor." With this factor now included in the model, the performance anomaly of low volatility stocks was accounted for. These models, known as multi-factor models, form the basis of factor investing and smart beta.