What Is the Winsorized Mean?
Winsorized mean is a method of averaging that initially replaces the smallest and largest values with the observations closest to them. This is done to limit the effect of outliers or abnormal extreme values, or outliers, on the calculation.
After replacing the values, the arithmetic mean formula is then used to calculate the winsorized mean.
- The winsorized mean is an averaging method that involves replacing the smallest and largest values of a data set with the observations closest to them.
- It mitigates the effects of outliers by replacing them with less extreme values.
- The winsorized mean is not the same as the trimmed mean, which involves removing data points as opposed to replacing them—although the results of the two tend to be close.
Formula for the Winsorized Mean
Winsorized Mean = Nxn…xn+1 + xn+2…xnwhere:n = The number of largest and smallest datapoints to be replaced by the observationclosest to themN = Total number of data points
Winsorized means are expressed in two ways. A "kn" winsorized mean refers to the replacement of the "k" smallest and largest observations, where "k" is an integer. An "X%" winsorized mean involves replacing a given percentage of values from both ends of the data.
The winsorized mean is achieved by replacing the smallest and largest data points, then summing all the data points and dividing the sum by the total number of data points.
What Does the Winsorized Mean Tell You?
The winsorized mean is less sensitive to outliers because it can replace them with less extreme values. That is, it is less susceptible to outliers versus the arithmetic average. However, if a distribution has fat tails, the effect of removing the highest and lowest values in the distribution will have little influence because of the high degree of variability in the distribution figures.
One major downside for winsorized means is that they naturally introduce some bias into the data set. By reducing the influence of outliers, the analysis is modified for better analysis, but also removes information about the underlying data.
Example of How to Use Winsorized Mean
Let's calculate the winsorized mean for the following data set: 1, 5, 7, 8, 9, 10, 34. In this example, we assume the winsorized mean is in the first order, in which we replace the smallest and largest values with their nearest observations.
The data set now appears as follows: 5, 5, 7, 8, 9, 10, 10. Taking an arithmetic average of the new set produces a winsorized mean of 7.7, or (5 + 5 + 7 + 8 + 9 + 10 + 10) divided by 7. Note that the arithmetic mean would have been higher—10.6. The winsorized mean effectively reduces the influence of the 34 value as an outlier.
Or consider a 20% winsorized mean that takes the top 10% and bottom 10% and replaces them with their next closest value. We will winsorize the following data set: 2, 4, 7, 8, 11, 14, 18, 23, 23, 27, 35, 40, 49, 50, 55, 60, 61, 61, 62, 75. The two smallest and two largest data points—20% of the 20 data points—will be replaced with their next closest value. Thus, the new data set is as follows: 7, 7, 7, 8, 11, 14, 18, 23, 23, 27, 35, 40, 49, 50, 55, 60, 61, 61, 61, 61. The winsorized mean is 33.9, or the total of the data (678) divided by the total number of data points (20).
Winsorized Mean vs. Trimmed Mean
The winsorized mean includes modifying data points, while the trimmed mean involves removing data points. It is common for the winsorized mean and trimmed mean to be close or sometimes equal in value to each other.