## What Is a Chi-Square Statistic?

A chi-square (*χ*^{2})^{ }statistic is a test that measures how a model compares to actual observed data. The data used in calculating a chi-square statistic must be random, raw, mutually exclusive, drawn from independent variables, and drawn from a large enough sample. For example, the results of tossing a fair coin meet these criteria.

Chi-square tests are often used to test hypotheses. The chi-square statistic compares the size of any discrepancies between the expected results and the actual results, given the size of the sample and the number of variables in the relationship.

For these tests, degrees of freedom are used to determine if a certain null hypothesis can be rejected based on the total number of variables and samples within the experiment. As with any statistic, the larger the sample size, the more reliable the results.

### Key Takeaways

- A chi-square (
*χ*^{2})^{ }statistic is a measure of the difference between the observed and expected frequencies of the outcomes of a set of events or variables. - Chi-square is useful for analyzing such differences in categorical variables, especially those nominal in nature.
*χ*^{2 }depends on the size of the difference between actual and observed values, the degrees of freedom, and the sample size.*χ*^{2 }can be used to test whether two variables are related or independent from one another.- It can also be used to test the goodness-of-fit between an observed distribution and a theoretical distribution of frequencies.

## The Formula for Chi-Square Is

$\begin{aligned}&\chi^2_c = \sum \frac{(O_i - E_i)^2}{E_i} \\&\textbf{where:}\\&c=\text{Degrees of freedom}\\&O=\text{Observed value(s)}\\&E=\text{Expected value(s)}\end{aligned}$

## What Does a Chi-Square Statistic Tell You?

There are two main kinds of chi-square tests: the test of independence, which asks a question of relationship, such as, "Is there a relationship between student gender and course choice?"; and the goodness-of-fit test, which asks something like "How well does the coin in my hand match a theoretically fair coin?"

Chi-square analysis is applied to categorical variables and is especially useful when those variables are nominal (where order doesn't matter, like marital status or gender).

### Independence

When considering student gender and course choice, a *χ*^{2 }test for independence could be used. To do this test, the researcher would collect data on the two chosen variables (gender and courses picked) and then compare the frequencies at which male and female students select among the offered classes using the formula given above and a *χ*^{2} statistical table.

If there is no relationship between gender and course selection (that is, if they are independent), then the actual frequencies at which male and female students select each offered course should be expected to be approximately equal, or conversely, the proportion of male and female students in any selected course should be approximately equal to the proportion of male and female students in the sample.

A *χ*^{2 }test for independence can tell us how likely it is that random chance can explain any observed difference between the actual frequencies in the data and these theoretical expectations.

### Goodness-of-Fit

*χ*^{2} provides a way to test how well a sample of data matches the (known or assumed) characteristics of the larger population that the sample is intended to represent. This is known as goodness of fit.

If the sample data do not fit the expected properties of the population that we are interested in, then we would not want to use this sample to draw conclusions about the larger population.

## Example

For example, consider an imaginary coin with exactly a 50/50 chance of landing heads or tails and a real coin that you toss 100 times. If this coin is fair, then it will also have an equal probability of landing on either side, and the expected result of tossing the coin 100 times is that heads will come up 50 times and tails will come up 50 times.

In this case, *χ*^{2} can tell us how well the actual results of 100 coin flips compare to the theoretical model that a fair coin will give 50/50 results. The actual toss could come up 50/50, or 60/40, or even 90/10. The farther away the actual results of the 100 tosses is from 50/50, the less good the fit of this set of tosses is to the theoretical expectation of 50/50, and the more likely we might conclude that this coin is not actually a fair coin.

## When to Use a Chi-Square Test

A chi-square test is used to help determine if observed results are in line with expected results, and to rule out that observations are due to chance.

A chi-square test is appropriate for this when the data being analyzed are from a random sample, and when the variable in question is a categorical variable. A categorical variable is one that consists of selections such as type of car, race, educational attainment, male or female, or how much somebody likes a political candidate (from very much to very little).

These types of data are often collected via survey responses or questionnaires. Therefore, chi-square analysis is often most useful in analyzing this type of data.

## How to Perform a Chi-Square Test

These are the basic steps whether you are performing a goodness of fit test or a test of independence:

- Create a table of the observed and expected frequencies;
- Use the formula to calculate the chi-square value;
- Find the critical chi-square value using a chi-square value table or statistical software;
- Determine whether the chi-square value or the critical value is the larger of the two;
- Reject or accept the null hypothesis.

## Limitations of the Chi-Square Test

The chi-square test is sensitive to sample size. Relationships may appear to be significant when they aren't simply because a very large sample is used.

In addition, the chi-square test cannot establish whether one variable has a causal relationship with another. It can only establish whether two variables are related.

## What Is a Chi-square Test Used for?

Chi-square is a statistical test used to examine the differences between categorical variables from a random sample in order to judge goodness of fit between expected and observed results.

## Who Uses Chi-Square Analysis?

Since chi-square applies to categorical variables, it is most used by researchers who are studying survey response data. This type of research can range from demography to consumer and marketing research to political science and economics.

## Is Chi-Aquare Analysis Used When the Independent Variable Is Nominal or Ordinal?

A nominal variable is a categorical variable that differs by quality, but whose numerical order could be irrelevant. For instance, asking somebody their favorite color would produce a nominal variable. Asking somebody's age, on the other hand, would produce an ordinal set of data. Chi-square can be best applied to nominal data.

## The Bottom Line

There are two types of chi-square tests: the test of independence and the test of goodness of fit. Both are used to determine the validity of a hypothesis or an assumption. The result is a piece of evidence that can be used to make a decision. For example:

In a test of independence, a company may want to evaluate whether its new product, an herbal supplement that promises to give people an energy boost, is reaching the people who are most likely to be interested. It is being advertised on websites related to sports and fitness, on the assumption that active and health-conscious people are most likely to buy it. It does an extensive poll that is intended to evaluate interest in the product by demographic group. The poll suggests no correlation between interest in this product and the most health-conscious people.

In a test of goodness of fit, a marketing professional is considering launching a new product that the company believes will be irresistible to women over 45. The company has conducted product testing panels of 500 potential buyers of the product. The marketing professional has information about the age and gender of the test panels, This allows the construction of a chi-square test showing the distribution by age and gender of the people who said they would buy the product. The result will show whether or not the likeliest buyer is a woman over 45. If the test shows that men over 45 or women between 18 and 44 are just as likely to buy the product, the marketing professional will revise the advertising, promotion, and placement of the product to appeal to this wider group of customers.