What Is Spurious Correlation?

In statistics, a spurious correlation, or spuriousness, refers to a connection between two variables that appears causal but is not. Spurious relationships will initially appear to show that one variable directly affects another, but that is not the case. This spurious correlation is often caused by a third factor that is not apparent at the time of examination, sometimes called a confounding factor.

Key Takeaways

  • Spurious correlation, or spuriousness, is when two factors appear casually related but are not.
  • The appearance of a causal relationship is often due to similar movement on a chart that turns out to be coincidental or caused by a third "confounding" factor.
  • Spurious correlation can be caused by small sample sizes or arbitrary endpoints.
  • Statisticians and scientists use careful statistical analysis to determine spurious relationships.
  • Confirming a causal relationship requires a study that controls for all possible variables.

How Spurious Correlation Works

When two random variables track each other closely on a graph, it is easy to suspect correlation where a change in one variable causes a change in the other variable. Setting aside causation, which is another topic, this observation can lead the reader of the chart to believe that the movement of variable A is linked to the movement in variable B or vice versa.

However, closer statistical examination may show that the aligned movements are coincidental or caused by a third factor that affects the two variables. This is a spurious correlation. Research conducted with small sample sizes or arbitrary endpoints is particularly susceptible to spuriousness.

Examples of Spurious Correlation

Interesting correlations are easy to find, but many will turn out to be spurious. Three examples are the skirt length theory, the Super Bowl indicator, and a suggested correlation between race and college completion rates.

The Skirt Length Theory

Originating in the 1920s, the skirt length theory holds that skirt lengths and stock market direction are correlated. If skirt lengths are long, the correlation is that the stock market is bearish. If shirt lengths are short, the market is bullish.

The Super Bowl Indicator

In late January, there is often chatter about the so-called Super Bowl indicator, which suggests that a win by the American Football Conference team likely means that the stock market will go down in the coming year, whereas a victory by the National Football Conference team portends a rise in the market.

Since the beginning of the Super Bowl era, the indicator has been accurate around 74% of the time, or 40 out of the 54 years, according to OpenMarkets. It is a fun conversation piece but probably not something a serious financial advisor would recommend as an investment strategy for clients.

Educational Attainment and Race

Social scientists have focused on identifying which variables impact educational attainment. According to EducationData.org, in 2019, White 25- to 29-year-olds were 55% more likely than their Black counterparts to have completed college. The data imply that race has a causal effect on college completion rates; however, it is not race itself that impacts educational attainment but the effects of racism in society, which is the third "hidden" variable.

Racism impacts people of color, placing them at a disadvantage educationally and economically. For example, the schools in non-White communities face greater challenges and receive less funding, parents in non-White populations have lower-paying jobs and fewer resources to devote to their children's education, and many families live in food deserts and suffer from malnutrition. Racism, then, is a causal variable that impacts educational attainment, not race.

How to Spot Spurious Correlations

Statisticians and other scientists who analyze data must be on the lookout for spurious relationships all the time. There are numerous methods that they use to identify them including:

  • Ensuring a proper representative sample
  • Obtaining an adequate sample size
  • Being wary of arbitrary endpoints
  • Controlling for as many outside variables as possible
  • Using a null hypothesis and checking for a strong p-value

Many spurious relationships can be identified by using common sense. If a correlation is found, there is usually more than one variable at play, and the variables are often not immediately obvious.

Spurious Correlations FAQs

How do you know if a correlation is spurious?

The obvious way to spot a spurious relationship in research findings is to use common sense. Just because two things occur and appear to be linked does not mean that there are no other factors at work. However, to know for sure, research methods are critically examined. In studies, all variables that might impact the findings should be included in the statistical model to control their impact on the dependent variable.

What is an example of correlation but not causation?

An example of a correlation is that more sleep leads to better performance during the day. Although there is a correlation, there is not necessarily causation. More sleep may not be the reason an individual performs better; for example, they might be using a new software tool that is increasing their productivity. To find causation, there must be factual evidence from a study that shows a causal relationship between sleep and performance.

What is the meaning of spurious regression?

Spurious regression is a statistical model that shows misleading statistical evidence of a linear relationship; in other words, a spurious correlation between independent non-stationary variables.

What Is an Example of False Causality?

False causality occurs when we are quick to assume that one thing causes something else because we’ve noticed a relationship between them. For example, we may assume that Harry has been training hard to become a faster runner because his race times have improved. However, the reality might be that Harry's race times have improved because he has new running shoes made with the latest technology. The initial assumption was a false causality.