The normal distribution formula is based on two simple parameters - mean and standard deviation – which quantify the characteristics of a given dataset. While the mean indicates the “central” or average value of the entire dataset, the standard deviation indicates the “spread” or variation of data-points around that mean value.
Consider the following 2 datasets:
Dataset 1 = {10, 10, 10, 10, 10, 10, 10, 10, 10, 10}
Dataset 2 = {6, 8, 10, 12, 14, 14, 12, 10, 8, 6}
For Dataset1, mean = 10 and standard deviation (stddev) = 0
For Dataset2, mean = 10 and standard deviation (stddev) = 2.83
Let’s plot these values for DataSet1:
Similarly for DataSet2:
The red horizontal line in both the above graphs indicates the “mean” or average value of each dataset (10 in both cases). The pink arrows in the second graph indicate the spread or variation of data values from the mean value. This is represented by standard deviation value of 2.83 in case of DataSet2. Since DataSet1 has all values same (as 10 each) and no variations, the stddev value is zero, and hence no pink arrows are applicable.
The stddev value has a few significant and useful characteristics which are extremely helpful in data analysis. For a normal distribution, the data values are symmetrically distributed on either side of the mean. For any normally distributed dataset, plotting graph with stddev on horizontal axis and no. of data values on vertical axis, the following graph is obtained.
Properties of a Normal Distribution
- The normal curve is symmetrical about the mean;
- The mean is at the middle and divides the area into two halves;
- The total area under the curve is equal to 1 for mean=0 and stdev=1;
- The distribution is completely described by its mean and stddev
As can be seen from the above graph, stddev represents the following:
- 68.3% of data values are within 1 standard deviation of the mean (-1 to +1)
- 95.4% of data values are within 2 standard deviations of the mean (-2 to +2)
- 99.7% of data values are within 3 standard deviations of the mean (-3 to +3)
The area under the bell shaped curve, when measured, indicates the desired probability of a given range:
- less than X: – e.g. probability of data values being less than 70
- greater than X – e.g. probability of data values being greater than 95
- between X_{1} and X_{2 }– e.g. probability of data values between 65 and 85
where X is a value of interest (examples below).
Plotting and calculating the area is not always convenient, as different datasets will have different mean and stddev values. To facilitate a uniform standard method for easy calculations and applicability to real world problems, the standard conversion to Z-values was introduced, which form the part of the Normal Distribution Table.
Z = (X – mean)/stddev, where X is the random variable.
Basically, this conversion forces the mean and stddev to be standardized to 0 and 1 respectively, which enables a standard defined set of Z-values (from the Normal Distribution Table) to be used for easy calculations. A snap-shot of standard z-value table containing probability values is as follows:
z |
0.00 |
0.01 |
0.02 |
0.03 |
0.04 |
0.05 |
0.06 |
0.0 |
0.00000 |
0.00399 |
0.00798 |
0.01197 |
0.01595 |
0.01994 |
… |
0.1 |
0.0398 |
0.04380 |
0.04776 |
0.05172 |
0.05567 |
0.05966 |
… |
0.2 |
0.0793 |
0.08317 |
0.08706 |
0.09095 |
0.09483 |
0.09871 |
… |
0.3 |
0.11791 |
0.12172 |
0.12552 |
0.12930 |
0.13307 |
0.13683 |
… |
0.4 |
0.15542 |
0.15910 |
0.16276 |
0.16640 |
0.17003 |
0.17364 |
… |
0.5 |
0.19146 |
0.19497 |
0.19847 |
0.20194 |
0.20540 |
0.20884 |
… |
0.6 |
0.22575 |
0.22907 |
0.23237 |
0.23565 |
0.23891 |
0.24215 |
… |
0.7 |
0.25804 |
0.26115 |
0.26424 |
0.26730 |
0.27035 |
0.27337 |
… |
… |
… |
… |
… |
… |
… |
… |
… |
To find the probability related to z-value of 0.239865, first round it off to 2 decimal places (i.e. 0.24). Then check for the first 2 significant digits (0.2) in the rows and for the least significant digit (remaining 0.04) in the column. That will lead to value of 0.09483.
The full normal distribution table, with precision up to 5 decimal point for probability values (including those for negative values), can be found here.
Let’s see some real life examples. Height of individuals in a large group follows a normal distribution pattern. Assume that we have a set of 100 individuals whose heights are recorded and the mean and stddev are calculated to 66 and 6 inches respectively.
Here are a few sample questions which can be easily answered using z-value table:
- What is the probability that a person in the group is 70 inches or less?
Question is to find cumulative value of P(X<=70) i.e. in the entire dataset of 100, how many values will be between 0 and 70.
Let’s first convert X-value of 70 to the equivalent Z-value.
Z = (X – mean)/stddev = (70-66)/6 = 4/6 = 0.66667 = 0.67 (round to 2 decimal places)
We now need to find P (Z <= 0.67) = 0. 24857 (from the z-table above)
i.e. there is a 24.857% probability that an individual in the group will be less than or equal to 70 inches.
But hang on – the above is incomplete. Remember, we are looking for probability of all possible heights upto 70 i.e. from 0 to 70. The above just gives you the portion from mean to desired value (i.e. 66 to 70). We need to include the other half – from 0 to 66 – to arrive at the correct answer.
Since 0 to 66 represents the half portion (i.e. one extreme to mid-way mean), its probability is simply 0.5.
Hence the correct probability of a person being 70 inches or less = 0.24857 + 0.5 = 0. 74857 = 74.857%
Graphically (by calculating the area), these are the two summed regions representing the solution:
- What is the probability that a person is 75 inches or higher?
i.e. Find Complementary cumulative P(X>=75).
Z = (X – mean)/stddev = (75-66)/6 = 9/6 = 1.5
P (Z >=1.5) = 1- P (Z <= 1.5) = 1 – (0.5+0.43319) = 0.06681 = 6.681%
- What is the probability of a person being in between 52 inches and 67 inches?
Find P(52<=X<=67).
P(52<=X<=67) = P [(52-66)/6 <= Z <= (67-66)/6] = P(-2.33 <= Z <= 0.17)
= P(Z <= 0.17) –P(Z <= -0.233) = (0.5+0.56749) - (.40905) =
This normal distribution table (and z-values) commonly finds use for any probability calculations on expected price moves in stock market for stocks and indices. They are used in range based trading, identifying uptrend or downtrend, support or resistance levels, and other technical indicators based on normal distribution concepts of mean and standard deviation.