What Is Goodness-of-Fit?
The term goodness-of-fit refers to a statistical test that determines how well sample data fits a distribution from a population with a normal distribution. Put simply, it hypothesizes whether a sample is skewed or represents the data you would expect to find in the actual population.
Goodness-of-fit establishes the discrepancy between the observed values and those expected of the model in a normal distribution case. There are multiple methods to determine goodness-of-fit, including the chi-square.
Key Takeaways
- A goodness-of-fit is a statistical test that tries to determine whether a set of observed values match those expected under the applicable model.
- They can show you whether your sample data fit an expected set of data from a population with normal distribution.
- There are multiple types of goodness-of-fit tests, but the most common is the chi-square test.
- The chi-square test determines if a relationship exists between categorical data.
- The Kolmogorov-Smirnov test determines whether a sample comes from a specific distribution of a population.
Understanding Goodness-of-Fit
Goodness-of-fit tests are statistical methods that make inferences about observed values. For instance, you can determine whether a sample group is truly representative of the entire population. As such, they determine how actual values are related to the predicted values in a model. When used in decision-making, goodness-of-fit tests make it easier to predict trends and patterns in the future.
As noted above, there are several types of goodness-of-fit tests. They include the chi-square test, which is the most common, as well as the Kolmogorov-Smirnov test, and the Shapiro-Wilk test. The tests are normally conducted using computer software. But statisticians can do these tests using formulas that are tailored to the specific type of test.
To conduct the test, you need a certain variable, along with an assumption of how it is distributed. You also need a data set with clear and explicit values, such as:
- The observed values, which are derived from the actual data set
- The expected values, which are taken from the assumptions made
- The total number of categories in the set
Goodness-of-fit tests are commonly used to test for the normality of residuals or to determine whether two samples are gathered from identical distributions.
Establishing an Alpha Level
In order to interpret a goodness-of-fit test, it's important for statisticians to establish an alpha level, such as the p-value for the chi-square test. The p-value refers to the probability of getting results close to extremes of the observed results. This assumes that the null hypothesis is correct. A null hypothesis asserts there is no relationship that exists between variables, and the alternative hypothesis assumes that a relationship exists.
Instead, the frequency of the observed values is measured and subsequently used with the expected values and the degrees of freedom to calculate chi-square. If the result is lower than alpha, the null hypothesis is invalid, indicating a relationship exists between the variables.
Types of Goodness-of-Fit Tests
Chi-Square Test
χ2=i=1∑k(Oi−Ei)2/Ei
The chi-square test, which is also known as the chi-square test for independence, is an inferential statistics method that tests the validity of a claim made about a population based on a random sample.
Used exclusively for data that is separated into classes (bins), it requires a sufficient sample size to produce accurate results. But it doesn't indicate the type or intensity of the relationship. For instance, it does not conclude whether the relationship is positive or negative.
To calculate a chi-square goodness-of-fit, set the desired alpha level of significance. So if your confidence level is 95% (or 0.95), then the alpha is 0.05. Next, identify the categorical variables to test, then define hypothesis statements about the relationships between them.
Variables must be mutually exclusive in order to qualify for the chi-square test for independence. And the chi goodness-of-fit test should not be used for data that is continuous.
Kolmogorov-Smirnov (K-S) Test
D=1≤i≤Nmax(F(Yi)−Ni−1,Ni−F(Yi))
Named after Russian mathematicians Andrey Kolmogorov and Nikolai Smirnov, the Kolmogorov-Smirnov (K-S) test is a statistical method that determines whether a sample is from a specific distribution within a population.
This test, which is recommended for large samples (e.g., over 2000), is non-parametric. That means it does not rely on any distribution to be valid. The goal is to prove the null hypothesis, which is the sample of the normal distribution.
Like chi-square, it uses a null and alternative hypothesis and an alpha level of significance. Null indicates that the data follow a specific distribution within the population, and alternative indicates that the data did not follow a specific distribution within the population. The alpha is used to determine the critical value used in the test. But unlike the chi-square test, the Kolmogorov-Smirnov test applies to continuous distributions.
The calculated test statistic is often denoted as D. It determines whether the null hypothesis is accepted or rejected. If D is greater than the critical value at alpha, the null hypothesis is rejected. If D is less than the critical value, the null hypothesis is accepted.
The Anderson-Darling (A-D) Test
S=∑i=1NN(2i−1)[lnF(Yi)+ln(1−F(YN+1−i))]
The Anderson-Darling (A-D) test is a variation on the K-S test, but gives more weight to the tails of the distribution. The K-S test is more sensitive to differences that may occur closer to the center of the distribution, while the A-D test is more sensitive to variations observed in the tails. Because tail risk and the idea of "fatty tails" is prevalent in financial markets, the A-D test can give more power in financial analyses.
Like the K-S test, the A-D test produces a statistic, denoted as A2, which can be compared against the null hypothesis.
Shapiro-Wilk (S-W) Test
W=∑i=1n(xi−xˉ)2(∑i=1nai(x(i))2,
The Shapiro-Wilk (S-W) test determines if a sample follows a normal distribution. The test only checks for normality when using a sample with one variable of continuous data and is recommended for small sample sizes up to 2000.
The Shapiro-Wilk test uses a probability plot called the QQ Plot, which displays two sets of quantiles on the y-axis that are arranged from smallest to largest. If each quantile came from the same distribution, the series of plots are linear.
The QQ Plot is used to estimate the variance. Using QQ Plot variance along with the estimated variance of the population, one can determine if the sample belongs to a normal distribution. If the quotient of both variances equals or is close to 1, the null hypothesis can be accepted. If considerably lower than 1, it can be rejected.
Just like the tests mentioned above, this one uses alpha and forms two hypotheses: null and alternative. The null hypothesis states that the sample comes from the normal distribution, whereas the alternative hypothesis states that the sample does not come from the normal distribution.
Other Goodness-of-Fit Tests
Aside from the more common types of tests mentioned above, there are numerous other goodness-of-fit tests an analyst can use:
- The Bayesian information criterion (BIC) is a statistical measure used for model selection among a finite set of models. The BIC is a goodness-of-fit test that balances the complexity of a model with its goodness-of-fit to the data.
- The Cramer-von Mises criterion (CVM) is a goodness-of-fit test that is used to assess how well a set of observed data fits a hypothesized probability distribution. Often used in economics, engineering, or finance, it is based on the cumulative distribution function of the observed data and the hypothesized distribution.
- The Akaike information criterion (AIC) is a measure of the relative quality of a statistical model for a given set of data, and it provides a trade-off between the goodness-of-fit of the model and its complexity. It's based on information theory and measures the amount of information lost by a model when it is used to approximate the true underlying distribution of the data.
- The Hosmer-Lemeshow test compares the expected frequencies of a binary outcome with the observed frequencies of that outcome in different groups or intervals. The groups are typically formed by dividing the predicted probabilities of the outcome into ten groups or bins.
- Kuiper's test is similar to the Kolmogorov-Smirnov test, but it is more sensitive to differences in the tails of the distribution.
- Moran's I test or Moran's Index is a statistical test used to assess spatial autocorrelation in data. Spatial autocorrelation is a measure of the degree to which observations of a variable are similar or dissimilar in space.
A very general rule of thumb is to require that every group within a goodness-of-fit test have at least five data points. This ensures that sufficient information is fed into the test to determine the distribution.
Importance of Goodness-of-Fit Tests
Goodness-of-fit tests are important in statistics for many reasons. First, they provide a way to assess how well a statistical model fits a set of observed data. The main importance of running a goodness-of-fit test is to determine whether the observed data are consistent with the assumed statistical model. By extension, a goodness-of-fit test may be useful in choosing between different models which may better fit the data.
Goodness-of-fit tests can also help to identify outliers or market abnormalities that may be affecting the fit of the model. Outliers can have a large impact on the model fit and may need to be removed or dealt with separately. Sometimes, outliers are not easily identifiable until they have been integrated into an analytical model.
Goodness-of-fit tests can also provide information about the variability of the data and the estimated parameters of the model. This information can be useful for making predictions and understanding the behavior of the system being modeled. Based on the data being fed into the model, it may be necessary to refine the model specific to the dataset being tested, the residuals being calculated, and the p-value for potentially extreme data.
Goodness-of-Fit Test vs. Independence Test
Goodness-of-fit test and independence test are both statistical tests used to assess the relationship between variables; therefore, it may be easy to confuse the two. However, each are designed to answer different questions.
A goodness-of-fit test is used to evaluate how well a set of observed data fits a particular probability distribution. On the other hand, an independence test is used to assess the relationship between two variables. It is used to test whether there is any association between two variables. The primary purpose of an independence test is to see whether a change in one variable is related to a change in another variable.
An independence test is typically used when the research question is focused on understanding the relationship between two variables and whether they are related or independent. In many cases, an independence test is pointed towards two specific variables (i.e. does smoking cause lung cancer?). On the other hand, a goodness-of-fit test is used on an entire set of observed data to evaluate the appropriateness of a specific model.
Goodness-of-Fit Example
Here's a hypothetical example to show how the goodness-of-fit test works.
Suppose a small community gym operates under the assumption that the highest attendance is on Mondays, Tuesdays, and Saturdays, average attendance on Wednesdays, and Thursdays, and lowest attendance on Fridays and Sundays. Based on these assumptions, the gym employs a certain number of staff members each day to check in members, clean facilities, offer training services, and teach classes.
But the gym isn't performing well financially and the owner wants to know if these attendance assumptions and staffing levels are correct. The owner decides to count the number of gym attendees each day for six weeks. They can then compare the gym's assumed attendance with its observed attendance using a chi-square goodness-of-fit test for example.
Now that they have the new data, they can determine how to best manage the gym and improve profitability.
What Does Goodness-of-Fit Mean?
Goodness-of-Fit is a statistical hypothesis test used to see how closely observed data mirrors expected data. Goodness-of-Fit tests can help determine if a sample follows a normal distribution, if categorical variables are related, or if random samples are from the same distribution.
Why Is Goodness-of-Fit Important?
Goodness-of-Fit tests help determine if observed data aligns with what is expected. Decisions can be made based on the outcome of the hypothesis test conducted. For example, a retailer wants to know what product offering appeals to young people. The retailer surveys a random sample of old and young people to identify which product is preferred. Using chi-square, they identify that, with 95% confidence, a relationship exists between product A and young people. Based on these results, it could be determined that this sample represents the population of young adults. Retail marketers can use this to reform their campaigns.
What Is Goodness-of-Fit in the Chi-Square Test?
The chi-square test whether relationships exist between categorical variables and whether the sample represents the whole. It estimates how closely the observed data mirrors the expected data, or how well they fit.
How Do You Do the Goodness-of-Fit Test?
The Goodness-of-FIt test consists of different testing methods. The goal of the test will help determine which method to use. For example, if the goal is to test normality on a relatively small sample, the Shapiro-Wilk test may be suitable. If wanting to determine whether a sample came from a specific distribution within a population, the Kolmogorov-Smirnov test will be used. Each test uses its own unique formula. However, they have commonalities, such as a null hypothesis and level of significance.
The Bottom Line
Goodness-of-fit tests determine how well sample data fit what is expected of a population. From the sample data, an observed value is gathered and compared to the calculated expected value using a discrepancy measure. There are different goodness-of-fit hypothesis tests available depending on what outcome you're seeking.
Choosing the right goodness-of-fit test largely depends on what you want to know about a sample and how large the sample is. For example, if wanting to know if observed values for categorical data match the expected values for categorical data, use chi-square. If wanting to know if a small sample follows a normal distribution, the Shapiro-Wilk test might be advantageous. There are many tests available to determine goodness-of-fit.