What Is Spurious Correlation?
In statistics, a spurious correlation (or spuriousness) refers to a connection between two variables that appears to be causal but is not. With spurious correlation, any observed dependencies between variables are merely due to chance or are both related to some unseen confounder.
- Spurious correlation, or spuriousness, occurs when two factors appear causally related to one another but are not.
- The appearance of a causal relationship is often due to similar movement on a chart that turns out to be coincidental or caused by a third "confounding" factor.
- Spurious correlation can be caused by small sample sizes or arbitrary endpoints.
- Statisticians and scientists use careful statistical analysis to determine spurious relationships.
- Confirming a causal relationship requires a study that controls for all possible variables.
Understanding Spurious Correlation
Spurious relationships will initially appear to show that one variable directly affects another, but that is not the case. This misleading correlation is often caused by a third factor that is not apparent at the time of examination, sometimes called a confounding factor.
When two random variables track each other closely on a graph, it is easy to suspect correlation where a change in one variable causes a change in the other variable. Setting aside causation, which is another topic, this observation can lead the reader of the chart to believe that the movement of variable A is linked to the movement in variable B or vice versa.
However, closer statistical examination may show that the aligned movements are coincidental or caused by a third factor that affects the two variables. This is a spurious correlation. Research conducted with small sample sizes or arbitrary endpoints is particularly susceptible to spuriousness.
The most obvious way to spot a spurious relationship in research findings is to use common sense. Just because two things occur and appear to be linked does not mean that there are no other factors at work. However, to know for sure, research methods are critically examined.
In studies, all variables that might impact the findings should be included in the statistical model to control their impact on the dependent variable.
Many spurious relationships can be identified by using common sense. If a correlation is found, there is usually more than one variable at play, and the variables are often not immediately obvious.
Spurious Correlation Examples
Interesting correlations are easy to find, but many will turn out to be spurious. Three examples are the skirt length theory, the super bowl indicator, and a suggested correlation between race and college completion rates.
- Skirt Length Theory: Originating in the 1920s, the skirt length theory holds that skirt lengths and stock market direction are correlated. If skirt lengths are long, the correlation is that the stock market is bearish. If shirt lengths are short, the market is bullish.
- Super Bowl Indicator: In late January, there is often chatter about the so-called Super Bowl indicator, which suggests that a win by the American Football Conference team likely means that the stock market will go down in the coming year, whereas a victory by the National Football Conference team portends a rise in the market. Since the beginning of the Super Bowl era, the indicator has been accurate around 74% of the time, or 40 out of the 54 years, according to OpenMarkets. It is a fun conversation piece but probably not something a serious financial advisor would recommend as an investment strategy for clients.
- Educational Attainment and Race: Social scientists have focused on identifying which variables impact educational attainment. According to government research, 56% of White 25- to 29-year-olds had completed a college degree in 2019, compared to just 36% of black individuals of the same age. The implication being that race has a causal effect on college completion rates.
However, it may not be race itself that impacts educational attainment. The results may also be due to the effects of racism in society, which could be the third "hidden" variable. Racism impacts people of color, placing them at a disadvantage educationally and economically. For example, the schools in non-white communities face greater challenges and receive less funding, parents in non-white populations have lower-paying jobs and fewer resources to devote to their children's education, and many families live in food deserts and suffer from malnutrition. Racism, rather than race, might be viewed as a causal variable that impacts educational attainment.
How to Spot Spurious Correlation?
Statisticians and other scientists who analyze data must be on the lookout for spurious relationships all the time. There are numerous methods that they use to identify them including:
- Ensuring a proper representative sample
- Obtaining an adequate sample size
- Being wary of arbitrary endpoints
- Controlling for as many outside variables as possible
- Using a null hypothesis and checking for a strong p-value
What Is an Example of Correlation but not Causation?
An example of a correlation is that more sleep leads to better performance during the day. Although there is a correlation, there is not necessarily causation. More sleep may not be the reason an individual performs better; for example, they might be using a new software tool that is increasing their productivity. To find causation, there must be factual evidence from a study that shows a causal relationship between sleep and performance.
What Is Spurious Regression?
Spurious regression is a statistical model that shows misleading statistical evidence of a linear relationship; in other words, a spurious correlation between independent non-stationary variables.
What Is False Causality?
False causality refers to the assumption made that one thing causes something else because of a relationship between them. For example, we may assume that Harry has been training hard to become a faster runner because his race times have improved. However, the reality might be that Harry's race times have improved because he has new running shoes made with the latest technology. The initial assumption was a false causality.