What Are Nonparametric Statistics?
Nonparametric statistics refers to a statistical method in which the data are not assumed to come from prescribed models that are determined by a small number of parameters; examples of such models include the normal distribution model and the linear regression model. Nonparametric statistics sometimes uses data that is ordinal, meaning it does not rely on numbers, but rather on a ranking or order of sorts. For example, a survey conveying consumer preferences ranging from like to dislike would be considered ordinal data.
Nonparametric statistics includes nonparametric descriptive statistics, statistical models, inference, and statistical tests. The model structure of nonparametric models is not specified a priori but is instead determined from data. The term nonparametric is not meant to imply that such models completely lack parameters, but rather that the number and nature of the parameters are flexible and not fixed in advance. A histogram is an example of a nonparametric estimate of a probability distribution.
Key Takeaways
- Nonparametric statistics are easy to use but do not offer the pinpoint accuracy of other statistical models.
- This type of analysis is often best suited when considering the order of something, where even if the numerical data changes, the results will likely stay the same.
Understanding Nonparametric Statistics
In statistics, parametric statistics includes parameters such as the mean, standard deviation, Pearson correlation, variance, etc. This form of statistics uses the observed data to estimate the parameters of the distribution. Under parametric statistics, data are often assumed to come from a normal distribution with unknown parameters μ (population mean) and σ2 (population variance), which are then estimated using the sample mean and sample variance.
Nonparametric statistics makes no assumption about the sample size or whether the observed data is quantitative.
Nonparametric statistics does not assume that data is drawn from a normal distribution. Instead, the shape of the distribution is estimated under this form of statistical measurement. While there are many situations in which a normal distribution can be assumed, there are also some scenarios in which the true data generating process is far from normally distributed.
Examples of Nonparametric Statistics
In the first example, consider a financial analyst who wishes to estimate the value-at-risk (VaR) of an investment. The analyst gathers earnings data from 100’s of similar investments over a similar time horizon. Rather than assume that the earnings follow a normal distribution, they use the histogram to estimate the distribution nonparametrically. The 5th percentile of this histogram then provides the analyst with a nonparametric estimate of VaR.
For a second example, consider a different researcher who wants to know whether average hours of sleep is linked to how frequently one falls ill. Because many people get sick rarely, if at all, and occasional others get sick far more often than most others, the distribution of illness frequency is clearly non-normal, being right-skewed and outlier-prone. Thus, rather than use a method that assumes a normal distribution for illness frequency, as is done in classical regression analysis, for example, the researcher decides to use a nonparametric method such as quantile regression analysis.
Special Considerations
Nonparametric statistics have gained appreciation due to their ease of use. As the need for parameters is relieved, the data becomes more applicable to a larger variety of tests. This type of statistics can be used without the mean, sample size, standard deviation, or the estimation of any other related parameters when none of that information is available.
Since nonparametric statistics makes fewer assumptions about the sample data, its application is wider in scope than parametric statistics. In cases where parametric testing is more appropriate, nonparametric methods will be less efficient. This is because nonparametric statistics discard some information that is available in the data, unlike parametric statistics.