What Is the Winsorized Mean?

Winsorized mean is a method of averaging that initially replaces the smallest and largest values with the observations closest to them. This is done to limit the effect of outliers or abnormal extreme values, or outliers, on the calculation. After replacing the values, the arithmetic mean formula is then used to calculate the winsorized mean.

key takeaways

  • The winsorized mean is an averaging method that involves replacing the smallest and largest values of a data set with the observations closest to them.
  • It mitigates the effects of outliers by replacing them with less extreme values.
  • The winsorized mean is not the same as the trimmed mean, which involves removing data points as opposed to replacing them—although the results of the two tend to be close.

Formula for the Winsorized Mean

 Winsorized Mean   =   x n x n + 1   +   x n + 2 x n N where: n   =   The number of largest and smallest data points to be replaced by the observation \begin{aligned} &\text{Winsorized Mean}\ =\ \frac{x_{n}\dots x_{n+1}\ +\ x_{n+2}\dots x_{n}}{N}\\ &\textbf{where:}\\ &\begin{aligned} n\ =\ &\text{The number of largest and smallest data}\\ &\text{points to be replaced by the observation}\\ &\text{closest to them}\end{aligned}\\ &N\ =\ \text{Total number of data points} \end{aligned} Winsorized Mean = Nxnxn+1 + xn+2xnwhere:n = The number of largest and smallest datapoints to be replaced by the observation

Winsorized means are expressed in two ways. A "kn" winsorized mean refers to the replacement of the 'k' smallest and largest observations, where 'k' is an integer. An "X%" winsorized mean involves replacing a given percentage of values from both ends of the data.

The winsorized mean is achieved by replacing the smallest and largest data points, then summing all the data points and dividing the sum by the total number of data points.

What Does the Winsorized Mean Tell You?

The winsorized mean is less sensitive to outliers because it can replace them with less extreme values. That is, it is less susceptible to outliers versus the arithmetic average. However, if a distribution has fat tails, the effect of removing the highest and lowest values in the distribution will have little influence because of the high degree of variability in the distribution figures.

Example of How to Use Winsorized Mean

Let's calculate the winsorized mean for the following data set: 1, 5, 7, 8, 9, 10, 34. In this example, we assume the winsorized mean is in the first order, in which we replace the smallest and largest values with their nearest observations.

The data set now appears as follows: 5, 5, 7, 8, 9, 10, 10. Taking an arithmetic average of the new set produces a winsorized mean of 7.7, or (5 + 5 + 7 + 8 + 9 + 10 + 10) divided by 7. Note that the arithmetic mean would have higher - 10.6. The winsorized mean effectively reduces the influence of the 34 value as an outlier.

Or consider a 20% winsorized mean that takes the top 10% and bottom 10% and replaces them with their next closest value. We will winsorize the following data set: 2, 4, 7, 8, 11, 14, 18, 23, 23, 27, 35, 40, 49, 50, 55, 60, 61, 61, 62, 75. The two smallest and largest data points—10% of the 20 data points—will be replaced with their next closest value. Thus, the new data set is as follows: 7, 7, 7, 8, 11, 14, 18, 23, 23, 27, 35, 40, 49, 50, 55, 60, 61, 61, 61, 61. The winsorized mean is 33.9, or the total of the data (678) divided by the total number of data points (20).

Winsorized Mean vs. Trimmed Mean

The winsorized mean includes modifying data points, while the trimmed mean involves removing data points. It is common for the winsorized mean and trimmed mean to be close or sometimes equal in value to one another.

Limitations of the Winsorized Mean

One major downside for winsorized means is that they naturally introduce some bias into the data set. By reducing the influence of outliers, the analysis is modified for better analysis, but also removes information about the underlying data.