What is the 'Empirical Rule'
The empirical rule is the statistical rule stating that for a normal distribution, almost all data will fall within three standard deviations of the mean. Broken down, the empirical rule shows that 68% will fall within the first standard deviation, 95% within the first two standard deviations, and 99.7% will fall within the first three standard deviations of the distribution's average.
BREAKING DOWN 'Empirical Rule'The empirical rule is often referred to as the three-sigma rule or the 68-95-99.7 rule. The Empirical Rule is most often used in statistics for forecasting final outcomes. After a standard deviation is calculated, and before exact data can be collected, this rule can be used as a rough estimate as to the outcome of the impending data. This probability can be used in the meantime as gathering appropriate data may be time consuming, or even impossible to obtain. The empirical rule is also used as a rough way to test a distribution's "normality". If too many data points fall outside the three standard deviation boundaries, this could suggest that the distribution is not normal.
Empirical Rule Examples
Imagine a population of animals in a zoo is known to be normally distributed. The average animal lives to be 13.1 years old and the standard deviation of lifespans is 1.5 years. If someone wants to know the probability that an animal will live longer than 14.6 years, they could use the empirical rule. Knowing the distribution's mean is 13.1 years old, the following age ranges occur for each standard deviation:
One standard deviation: (13.1 - 1.5) to (13.1 + 1.5), or 11.6 to 14.6
Two standard deviations: (13.1 - 2 x 1.5) to (13.1 + 2 x 1.5), or 10.1 to 16.1
Three standard deviations: (13.1 - 3 x 1.5) to (13.1 + 3 x 1.5), or, 8.6 to 17.6
The person solving this problem needs to calculate the total probability of the animal living 14.6 years or long. The empirical rule shows that 68% of the distribution lies within one standard deviation, in this case, from 11.6 to 14.6 years. Thus, the remaining 32% of the distribution lies outside this range. Half lies above 14.6 and half lies below 11.6. So the probability of the animal living more than 14.6 is 16% (32% divided by two).
As another example, assume instead that the average animal in the zoo lives to 10 years of age, with a standard deviation of 1.4 years. Assume the zookeeper is attempting to figure out the probability of an animal living more than 7.2 years. This distribution looks as follows:
One standard deviation: 8.6 to 11.4 years
Two standard deviations: 7.2 to 12.8 years
Three standard deviations: 5.8 to 14.2 years
The empirical rule states that 95% of the distribution lies within two standard deviations. Thus, 5% lies outside of two standard deviations; half above 12.8 years and half below 7.2 years. Thus, the probability of living more than 7.2 years is:
95% + (5% / 2) = 97.5%