## What Is Statistical Significance?

Statistical significance refers to the claim that a set of observed data are not the result of chance but can instead be attributed to a specific cause. Statistical significance is important for academic disciplines or practitioners that rely heavily on analyzing data and research, such as economics, finance, investing, medicine, physics, and biology.

Statistical significance can be considered strong or weak. When analyzing a data set and doing the necessary tests to discern whether one or more variables have an effect on an outcome, strong statistical significance helps support the fact that the results are real and not caused by luck or chance. Simply stated, if a p-value is small then the result is considered more reliable.

### Key Takeaways

- Statistical significance refers to the claim that a result from data generated by testing or experimentation is likely to be attributable to a specific cause.
- A high degree of statistical significance indicates that an observed relationship is unlikely to be due to chance.
- The calculation of statistical significance is subject to a certain degree of error.
- Statistical significance can be misinterpreted when researchers do not use language carefully in reporting their results.
- Several types of significance tests are used depending on the research being conducted.

Problems arise in tests of statistical significance because researchers are usually working with samples of larger populations and not the populations themselves. As a result, the samples must be representative of the population, so the data contained in the sample must not be biased in any way. In most sciences, including economics, a result may be considered statistically significant if it has a confidence level of 95% (or sometimes 99%).

## Understanding Statistical Significance

The calculation of statistical significance (significance testing) is subject to a certain degree of error. Even if data appear to have a strong relationship, researchers must account for the possibility that an apparent correlation arose due to random chance or a sampling error.

Sample size is an important component of statistical significance in that larger samples are less prone to flukes. Only randomly chosen, representative samples should be used in significance testing. The level at which one can accept whether an event is statistically significant is known as the significance level.

Researchers use a measurement known as the p-value to determine statistical significance: if the p-value falls below the significance level, then the result is statistically significant. The p-value is a function of the means and standard deviations of the data samples.

The p-value indicates the probability under which the given statistical result occurred, assuming chance alone is responsible for the result. If this probability is small, then the researcher can conclude that some other factor could be responsible for the observed data.

The opposite of the significance level, calculated as 1 minus the significance level, is the confidence level. It indicates the degree of confidence that the statistical result did not occur by chance or by sampling error. The customary confidence level in many statistical tests is 95%, leading to a customary significance level or p-value of 5%.

"P-hacking" is the practice of exhaustively comparing many different sets of data in search of a statistically significant result. This is subject to reporting bias because the researchers only report favorable results–not negative ones.

## Special Considerations

Statistical significance does not always indicate practical significance, meaning the results cannot be applied to real-world business situations. In addition, statistical significance can be misinterpreted when researchers do not use language carefully in reporting their results. The fact that a result is statistically significant does not imply that it is *not* the result of chance, just that this is less likely to be the case.

Just because two data series hold a strong correlation with one another does not imply causation. For example, the number of movies in which the actor Nicolas Cage stars in a given year is very highly correlated with the number of accidental drownings in swimming pools. But this correlation is spurious since there is no theoretical causal claim that can be made.

Another problem that may arise with statistical significance is that past data, and the results from that data, whether statistically significant or not, may not reflect ongoing or future conditions. In investing, this may manifest itself in a pricing model breaking down during times of financial crisis as correlations change and variables do not interact as usual. Statistical significance can also help an investor discern whether one asset pricing model is better than another.

## Types of Statistical Significance Tests

Several types of significance tests are used depending on the research being conducted. For example, tests can be employed for one, two, or more data samples of various sizes for averages, variances, proportions, paired or unpaired data, or different data distributions.

There are also different approaches to significance testing, depending on the type of data that is available. Ronald Fisher is credited with formulating one of the most flexible approaches, as well as setting the norm for significance at *p *< 0.05. Because most of the work can be done after the data have already been collected, this method remains popular for short-term or ad-hoc research projects.

Seeking to build on Fisher's method, Jerzy Neyman and Egon Pearson ended up developing an alternative approach. This method requires more work to be done before the data are collected, but it allows researchers to design their study in a way that controls the probability of reaching false conclusions.

### Null Hypothesis Testing

Statistical significance is used in null hypothesis testing where researchers attempt to support their theories by rejecting other explanations. Although the method is sometimes misunderstood, it remains the most popular method of data testing in medicine, psychology, and other fields.

The most common null hypothesis is that the parameter in question is equal to zero (typically indicating that a variable has zero effect on the outcome of interest). If researchers reject the null hypothesis with a confidence of 95% or better, they can claim that an observed relationship is statistically significant. Null hypotheses can also be tested for the equality of effect for two or more alternative treatments.

Contrary to popular misconception, a high level of statistical significance cannot prove that a hypothesis is true or false. In reality, statistical significance measures the likelihood that an observed outcome would have occurred, assuming that the null hypothesis is true.

Rejection of the null hypothesis, even if a very high degree of statistical significance can never *prove* something, can only add support to an existing hypothesis. On the other hand, failure to reject a null hypothesis is often grounds to dismiss a hypothesis.

Additionally, an effect can be statistically significant but have only a very small impact. For example, it may be statistically significant that companies that use two-ply toilet paper in their bathrooms have more productive employees, but the improvement in the absolute productivity of each worker is likely to be minuscule.

*Correction–May 15, 2022: *This article has been edited to highlight potential fallacies in significance testing.