What Is Statistical Significance?

Statistical significance refers to the claim that a result from data generated by testing or experimentation is not likely to occur randomly or by chance, but is instead likely to be attributable to a specific cause. Having statistical significance is important for academic disciplines or practitioners that rely heavily on analyzing data and research, such as economics, finance, investing, medicine, physics, and biology.

Statistical significance can be considered strong or weak. When analyzing a data set and doing the necessary tests to discern whether one or more variables have an effect on an outcome, strong statistical significance helps support the fact that the results are real and not caused by luck or chance. Simply stated, if a statistic has high significance then it's considered more reliable.

Problems arise in tests of statistical significance because researchers are usually working with samples of larger populations and not the populations themselves. As a result, the samples must be representative of the population, so the data contained in the sample must not be biased in any way. In most sciences, including economics, statistical significance is relevant if a claim can be made at a level of 95% (or sometimes 99%).

Understanding Statistical Significance

The calculation of statistical significance (significance testing) is subject to a certain degree of error. The researcher must define in advance the probability of a sampling error, which exists in any test that does not include the entire population. Sample size is an important component of statistical significance in that larger samples are less prone to flukes. Only random, representative samples should be used in significance testing. The level at which one can accept whether an event is statistically significant is known as the significance level.

Researchers use a test statistic known as the p-value to discern whether the event falls below the significance level; if it does, the result is statistically significant. The p-value is a function of the means and standard deviations of the data samples.

The p-value indicates the probability under which a statistical result occurred by chance or by sampling error. In other words, the p-value indicates the risk that there is no actual difference or relationship. The p-value must fall under the significance level for the results to at least be considered statistically significant. The opposite of the significance level, calculated as 1 minus the significance level, is the confidence level. It indicates the degree of confidence that the statistical result did not occur by chance or by sampling error. The customary confidence level in many statistical tests is 95 percent, leading to a customary significance level or p-value of 5 percent.

Special Considerations

Statistical significance does not always indicate practical significance, meaning the results cannot be applied to real-world business situations. In addition, statistical significance can be misinterpreted when researchers do not use language carefully in reporting their results. Because a result is statistically significant does not imply that it is not random, just that the probability of its being random is greatly reduced.

Just because two data series hold a strong correlation with one another does not imply causation. For example, the number of movies in which the actor Nicolas Cage stars in a given year is very highly correlated with the number of accidental drownings in swimming pools. But this correlation is spurious, since there is no theoretical causal claim that can be made.

Another problem that may arise with statistical significance is that past data, and the results from that data, whether statistically significant or not, may not reflect ongoing or future conditions. In investing, this may manifest itself in a pricing model breaking down during times of financial crisis as correlations change and variables do not interact as usual. Statistical significance can also help an investor discern whether one asset pricing model is better than another.

Types of Statistical Significance Tests

Several types of significance tests are used depending on the research being conducted. For example, tests can be employed for one, two or more data samples of various size, for averages, variances and proportions, paired or unpaired data, or for different data distributions. All these factors have what is called null hypotheses, and significance often is the goal of hypothesis testing in statistics. The most common null hypothesis is that the variable in question is equal to zero (typically indicating that it has zero effect on the outcome of interest). If you can reject the null hypothesis with a confidence of 95 percent or better, researchers can invoke statistical significance. Null hypotheses can also be tested for the equality (rather than equal to zero) of effect for two or more alternative treatments - for example, between a drug and a placebo in a clinical trial.

Rejection of the null hypothesis, even if a very high degree of statistical significance can never prove something, can only add support to an existing hypothesis. On the other hand, failure to reject a null hypothesis is often grounds for dismissal of a hypothesis.

A statistical significance test shares much of the same mathematics as that of computing a confidence interval. One way to interpret statistical significance is that, say 95 percent or 99 percent of the time, the confidence interval will not contain the value zero. Even if a variable is found to be statistically significant, it must still make sense in the real world. Additionally, an effect can be statistically significant but have only a very small impact; for example, it may be very unlikely due to chance that companies that use two-ply toilet paper in their bathrooms have more productive employees, but the improvement on the absolute productivity of each worker is likely to be minuscule.