The chi-squared test refers to a class of statistical tests in which the sampling distribution is a chi-square distribution. When used without further qualification, the term usually refers to Pearson's chi-squared test, which is used to test whether an observed distribution could have arisen from an expected distribution (under some assumption), or whether that assumption is likely to be wrong.
Usually, the chi-squared test is used to test for independence between two data sets. For instance, in a survey conducted in which the ages of participants are recorded, a chi-squared test can be used to determine if age affects the survey responses, or if the two are independent (since in this case, one would expect the responses to be roughly equivalent across all age groups). The test is also commonly used to test if a population follows a specific distribution; for instance, the test can be used to determine whether a die is fair, or whether a city has an equal number of men and women.
Consider a coin flipped 100 times. If the coin were fair (flips heads and tails with equal probability), the expected result would be 50 heads and 50 tails. However, the probability that exactly this result occurs is highly unlikely; 53 heads, for instance, would not generally be surprising. A result of 92 heads, however, would certainly suggest that the coin is not in fact fair.
The chi-squared test provides a way to test if an observed result (the number of heads) could feasibly have arisen randomly, or whether the original assumption (in this case of fairness) must be wrong.
The chi-squared statistic is defined by
where is the number of observations of type , and is the expected number of observations of type . The key to the chi-squared test is that the chi-squared statistic is well-approximated by a chi-squared distribution (which is itself an approximation to the multivariate normal distribution) with a properly chosen number of degrees of freedom.
Because of this approximation, a number of conditions (detailed in the next section) need to hold in order for the test to be valid. Should they hold, the chi-squared test proceeds as follows:
- Calculate the chi-squared statistic , defined above.
- Determine the number of degrees of freedom of the statistic. This depends on the particular expected distribution, but is usually (where is the number of categories).
- Select a confidence level, usually either 95% or 99%.
- Determine the critical value of the -distribution with degrees of freedom and the confidence level chosen above. Essentially, this is defined as the value at which the portion of the chi-squared distribution below is at least the desired confidence level.
- Compare the chi-squared statistic to the critical value. If it is below the critical value, the null hypothesis is not rejected. If it is above the critical value, the null hypothesis is rejected, and the expected distribution is probably wrong.
Intuitively, the test relies on the fact that if the expected distribution is indeed correct, the difference between the observed and expected distributions should approximate a multivariate normal distribution, which is approximated by a chi-squared distribution by the central limit theorem. If the chi-squared statistic is larger than the critical value, then it is unlikely to have occurred under this assumption, and thus the assumption is likely to be false.
The chi-squared test can also be used to test for independence between two data sets, where each "observation" is defined as the value of two outcomes arranged in a contingency table. In this case, the chi-squared statistic now runs over all cells of the table:
where are the number of observations in the first set and the number of observations in the second set, respectively. The number of degrees of freedom is .
As an independence test, the usual confidence level is . If the chi-squared statistic exceeds the critical value under these conditions, the independence assumption can be rejected, and the two data sets are unlikely to be independent.
The chi-squared test can be applied to determine if a die is fair, i.e. shows each of 1, 2, 3, 4, 5, and 6 an equal amount of times.
Suppose that after 96 rolls of a die, the die has shown 24 1s, 15 2s, 14 3s, 16 4s, 14 5s, and 13 6s. Is the die unfair?
This can be tabulated in the following table:
i 1 24 16 8 4 2 15 16 -1 0.0625 3 14 16 -2 0.25 4 16 16 0 0 5 14 16 -2 0.25 6 13 16 -3 0.5625
so the chi-squared statistic is . The number of degrees of freedom is , and the chi-squared distribution with 5 degrees of freedom and 95% confidence level has critical value . Since the chi-squared statistic is less than the critical value, this observation does not provide enough information to reject the null hypothesis of fairness.
If the observation were instead 29 1s, 8 2s, 12 3s, 17 4s, 14 5s, and 16 6s, the table would be
so the chi-squared statistic would be . This is sufficient to reject the null hypothesis at the 95% confidence level.
However, the critical value at 99.9% confidence level is 20.515, so this is not sufficient to reject the null hypothesis at the 99.9% confidence level.
Because the chi-squared distribution is only an approximation, there are several assumptions necessary for the chi-squared test to hold:
- The observations must be a simple random sample of the population (i.e. possible outcomes); every member of the population must have equal probability of being chosen (though generalized forms exist for weighted data).
- The sample size must be sufficiently large. As with all statistical tests, a small sample size may lead to a type II error.
- The expected value of each cell must be sufficiently large. The commonly used rule is all cells of a table should have expected value at least 5, and larger tables should have at least 80% of cells with expected value at least 5, as well as no cells with 0 expected value.
- Independence: each observation should be independent of the others, meaning that the chi-squared test cannot be used to test correlated data.
It is important to check these assumptions because the chi-squared test will function regardless of whether they are satisfied; the results, however, can be misleading if they are not met.