What Is P-Value?

In statistics, the p-value is the probability of obtaining results at least as extreme as the observed results of a statistical hypothesis test, assuming that the null hypothesis is correct. The p-value is used as an alternative to rejection points to provide the smallest level of significance at which the null hypothesis would be rejected. A smaller p-value means that there is stronger evidence in favor of the alternative hypothesis.

Key Takeaways

  • A p-value is a measure of the probability that an observed difference could have occurred just by random chance.
  • The lower the p-value, the greater the statistical significance of the observed difference.
  • P-value can be used as an alternative to or in a addition to pre-selected confidence levels for hypothesis testing.

How Is P-Value Calculated?

P-values are usually found using p-value tables or spreadsheets/statistical software. These calculations are based on the assumed or known probability distribution of the specific statistic being tested. P-values are calculated from the deviation between the observed value and a chosen reference value, given the probability distribution of the statistic, with a greater difference between the two values corresponding to a lower p-value. Mathematically, the p-value is calculated using integral calculus from the area under the probability distribution curve for all values of statistic that are at least as far from the reference value as the observed value is, relative to the total area under the probability distribution curve. In a nutshell, the greater the difference between two observed values, the less likely it is that the difference is due to simple random chance, and this is reflected by a lower p-value.

P-Value Approach to Hypothesis Testing

The p-value approach to hypothesis testing uses the calculated probability to determine whether there is evidence to reject the null hypothesis. The null hypothesis, also known as the conjecture, is the initial claim about a population (or data generating process). The alternative hypothesis states whether the population parameter differs from the value of the population parameter stated in the conjecture.

In practice, the significance level is stated in advance to determine how the small the p-value must be in order to reject the null hypothesis. Because different researchers use different levels of significance when examining a question, a reader may sometimes have difficulty comparing results from two different tests. P-values provide a solution to this problem.

For example, suppose a study comparing returns from two particular assets were undertaken by different researchers who used the same data but different significance levels. The researchers might come to opposite conclusions regarding whether the assets differ. If one researcher used confidence level of 90% and the other required a confidence level of 95% to reject the null hypothesis and the p-value of the observed difference between the two returns was 0.08 (corresponding to a confidence level of 92%) then the first researcher would find that the two assets have a difference that is statistically significant, while the second would find no statistically significant difference between the returns.

To avoid this problem, the researchers could report the p-value of the hypothesis test and allow the reader to interpret the statistical significance themselves. This is called a p-value approach to hypothesis testing. An independent observer could note the p-value, and decide for herself whether that represents a statistically significant difference or not.

Real-World Example of P-Value

Assume an investor claims that their investment portfolio's performance is equivalent to that of the Standard & Poor's (S&P) 500 Index. To determine this, the investor conducts a two-tailed test. The null hypothesis states that the portfolio's returns are equivalent to the S&P 500's returns over a specified period, while the alternative hypothesis states that the portfolio's returns and the S&P 500's returns are not equivalent. (If the investor conducted a one-tailed test, the alternative hypothesis would state that the portfolio's returns are either less than or greater than the S&P 500's returns.)

P-value hypothesis test does not necessarily make use of a pre-selected confidence level at which the investor should reset the null hypothesis that the returns are equivalent. Instead, it provides a measure of how much evidence there is to reject the null hypothesis. The smaller the p-value, the greater the evidence against the null hypothesis. Thus, if the investor finds that the p-value is 0.001, there is strong evidence against the null hypothesis, and the investor can confidently conclude the portfolio's returns and the S&P 500's returns are not be equivalent.

Though this does not provide an exact threshold as to when the investor should accept or reject the null hypothesis, it does have another very practical advantage. P-value hypothesis testing offers a direct way to compare the relative confidence that the investor can have when choosing among multiple different types of investments or portfolios, relative to a benchmark such as the S&P 500. For example for two portfolios, A and B, whose performance differs from the S&P 500 with p-values of 0.10 and 0.01 respectively, the investor can be much more confident that portfolio B, with a lower p-value will actually show consistently different results.