Quantitative Methods - Standard Deviation And Variance
Range and Mean Absolute Deviation
The range is the simplest measure of dispersion, the extent to which the data varies from its measure of central tendency. Dispersion or variability is a concept covered extensively in the CFA curriculum, as it emphasizes risk, or the chances that an investment will not achieve its expected outcome. If any investment has two dimensions - one describing risk, one describing reward - then we must measure and present both dimensions to gain an idea of the true nature of the investment. Mean return describes the expected reward, while the measures of dispersion describe the risk.
Range is simply the highest observation minus the lowest observation. For data that is sorted, it should be easy to locate maximum/minimum values and compute the range. The appeal of range is that it is simple to interpret and easy to calculate; the drawback is that by using just two values, it can be misleading if there are extreme values that turn out to be very rare, and it may not fairly represent the entire distribution (all of the outcomes).
Mean Absolute Deviation (MAD)
MAD improves upon range as an indicator of dispersion by using all of the data. It is calculated by:
1. Taking the difference between each observed value and the mean, which is the deviation
2. Using the absolute value of each deviation, adding all deviations together
3. Dividing by n, the number of observations.
To illustrate, we take an example of six mid-cap mutual funds, on which the five-year annual returns are +10.1, +7.7%, +5.0, +12.3%, +12.2% and +10.9%.
Range = Maximum - Minimum = (+12.3%) - (+5.0%) = 7.3%
Mean absolute deviation starts by finding the mean: (10.1% + 7.7% + 5.0% + 12.3% + 12.2% + 10.9%)/6 = 9.7%.
Each of the six observations deviate from the 9.7%; the absolute deviation ignores +/-.
1st: 10.1 - 9.7 = 0.4 3rd: 5.0 - 9.7 = 4.7 5th: 12.2 - 9.7 = 2.5
2nd: 7.7 - 9.7 = 2.0 4th: 12.3 - 9.7 = 2.6 6th: 10.9 - 9.7 = 1.2
Next, the absolute deviations are summed and divided by 6:(0.4 + 2.0 + 4.7 + 2.6 + 2.5 + 1.2)/6 = 13.4/6 = 2.233333, or rounded, 2.2%.
Variance (σ2) is a measure of dispersion that in practice can be easier to apply than mean absolute deviation because it removes +/- signs by squaring the deviations.
Returning to the example of mid-cap mutual funds, we had six deviations. To compute variance, we take the square of each deviation, add the terms together and divide by the number of observations.
|Observation||Value||Deviation from +9.7%||Square of Deviation|
Variance = (0.16 + 4.0 + 22.09 + 6.76 + 6.25 + 1.44)/6 = 6.7833. Variance is not in the same units as the underlying data. In this case, it's expressed as 6.7833% squared - difficult to interpret unless you are a mathematical expert (percent squared?).
Standard deviation (σ) is the square root of the variance, or (6.7833)1/2 = 2.60%. Standard deviation is expressed in the same units as the data, which makes it easier to interpret. It is the most frequently used measure of dispersion.
Our calculations above were done for a population of six mutual funds. In practice, an entire population is either impossible or impractical to observe, and by using sampling techniques, we estimate the population variance and standard deviation. The sample variance formula is very similar to the population variance, with one exception: instead of dividing by n observations (where n = population size), we divide by (n - 1) degrees of freedom, where n = sample size. So in our mutual fund example, if the problem was described as a sample of a larger database of mid-cap funds, we would compute variance using n - 1, degrees of freedom.
Sample variance (s2) = (0.16 + 4.0 + 22.09 + 6.76 + 6.25 + 1.44)/(6 - 1) = 8.14
Sample Standard Deviation (s)
Sample standard deviation is the square root of sample variance:
(8.14)1/2 = 2.85%.
In fact, standard deviation is so widely used because, unlike variance, it is expressed in the same units as the original data, so it is easy to interpret, and can be used on distribution graphs (e.g. the normal distribution).
Semivariance and Target Semivariance
Semivariance is a risk measure that focuses on downside risk, and is defined as the average squared deviation below the mean. Computing a semivariance starts by using only those observations below the mean, that is, any observations at or above the mean are ignored. From there, the process is similar to computing variance. If a return distribution is symmetric, semivariance is exactly half of the variance. If the distribution is negatively skewed, semivariance can be higher. The idea behind semivariance is to focus on negative outcomes.
Target semivariance is a variation of this concept, considering only those squared deviations below a certain target. For example, if a mutual fund has a mean quarterly return of +3.6%, we may wish to focus only on quarters where the outcome is -5% or lower. Target semivariance eliminates all quarters above -5%. From there, the process of computing target semivariance follows the same procedure as other variance measures.
Chebyshev's inequality states that the proportion of observations within k standard deviations of an arithmetic mean is at least 1 - 1/k2, for all k > 1.
|# of Standard Deviations from Mean (k)||Chebyshev\'s Inequality||% of Observations|
|2||1 - 1/(2)2, or 1 - 1/4, or 3/4||75 (.75)|
|3||1 - 1/(3)2, or 1 - 1/9, or||89 (.8889)|
|4||1 - 1/(4)2, or 1 - 1/16, or 15/16||94 (.9375)|
Given that 75% of observations fall within two standard deviations, if a distribution has an annual mean return of 10% and a standard deviation of 5%, we can state that in 75% of the years, the return will be anywhere from 0% to 20%. In 25% of the years, it will be either below 0% or above 20%. Given that there are 89% falling within three standard deviations means that in 89% of the years, the return will be within a range of -5% to +25%. Eleven percent of the time it won't.
Later we will learn that for so-called normal distributions, we expect about 95% of the observations to fall within two standard deviations. Chebyshev's inequality is more general and doesn't assume a normal distribution, that is, it applies to any shaped distribution.
Coefficient of Variation
The coefficient of variation (CV) helps the analyst interpret relative dispersion. In other words, a calculated standard deviation value is just a number. Does this number indicate high or low dispersion? The coefficient of variation helps describe standard deviation in terms of its proportion to its mean by this formula:
CV = s/X
Where: s = sample standard deviation, X = sample mean
The Sharpe ratio is a measure of the risk-reward tradeoff of an investment security or portfolio. It starts by defining excess return, or the percentage rate of return of a security above the risk-free rate. In this view, the risk-free rate is a minimum rate that any security should earn. Higher rates are available provided one assumes higher risk.
The Sharpe ratio is calculated by dividing the ratio of excess return, to the standard deviation of return.
Sharpe ratio = [(mean return) - (risk-free return)] / standard deviation of return
Skew And Kurtosis
Example: Sharpe Ratio
If an emerging-markets fund has a historic mean return of 18.2% and a standard deviation of 12.1%, and the return on three-month T-bills (our proxy for a risk-free rate) was 2.3%, the Sharpe ratio = (18.2)-(2.3)/12.1 = 1.31. In other words, for every 1% of additional risk we accept by investing in this emerging markets fund, we are rewarded with an excess 1.31%. Part of the reason that the Sharpe ratio has become popular is that it's an easy to understand and appealing concept, for practitioners and investors.