Chi-Square Tests: Testing Independence and Fit
You survey 500 people. 250 men, 250 women. You ask: "Do you support Policy X?"
Results:
- Men: 120 yes, 130 no
- Women: 180 yes, 70 no
Are gender and opinion independent? Or does opinion depend on gender?
You can't use a t-test (the outcome isn't continuous). You can't use ANOVA (same reason). You need a test for categorical data.
That's the chi-square test. It answers: "Do these categories associate?" Or: "Does this distribution match what we expect?"
It's everywhere. Medical studies (disease rates by demographic). Marketing (purchase behavior by age group). Genetics (observed vs. expected allele frequencies).
This article explains the two main chi-square tests: test of independence and goodness of fit.
Chi-Square Test of Independence
Question: Are two categorical variables independent?
Null hypothesis: The variables are independent. Knowing one tells you nothing about the other.
Alternative: The variables are associated. They depend on each other.
The Logic
Build a contingency table (cross-tabulation):
| Support | Oppose | Total | |
|---|---|---|---|
| Men | 120 | 130 | 250 |
| Women | 180 | 70 | 250 |
| Total | 300 | 200 | 500 |
If gender and opinion are independent, you'd expect support rates to be the same for men and women.
Overall support rate: 300/500 = 60%.
Expected values if independent:
- Men supporting: 250 × 0.60 = 150
- Men opposing: 250 × 0.40 = 100
- Women supporting: 250 × 0.60 = 150
- Women opposing: 250 × 0.40 = 100
Observed values:
- Men supporting: 120 (30 below expected)
- Women supporting: 180 (30 above expected)
The chi-square statistic measures how far observed deviates from expected:
$$\chi^2 = \sum \frac{(O - E)^2}{E}$$
Where O = observed, E = expected.
Larger $\chi^2$: Bigger deviation from independence. Stronger evidence of association.
Smaller $\chi^2$: Observed is close to expected. Consistent with independence.
Calculating Chi-Square
For our example:
$$\chi^2 = \frac{(120-150)^2}{150} + \frac{(130-100)^2}{100} + \frac{(180-150)^2}{150} + \frac{(70-100)^2}{100}$$
$$= \frac{900}{150} + \frac{900}{100} + \frac{900}{150} + \frac{900}{100} = 6 + 9 + 6 + 9 = 30$$
Degrees of freedom: $(r-1)(c-1)$ where $r$ = rows, $c$ = columns. Here: $(2-1)(2-1) = 1$.
Look up $\chi^2 = 30$ with df = 1. The p-value is < 0.0001.
Conclusion: Reject independence. Gender and opinion are associated. Women support the policy more than men.
Chi-Square Goodness of Fit
Question: Does the observed distribution match an expected distribution?
Example:
Mendel's pea genetics predicted a 3:1 ratio of dominant to recessive traits. He observed 315 dominant, 101 recessive (total 416).
Expected under 3:1:
- Dominant: 416 × 0.75 = 312
- Recessive: 416 × 0.25 = 104
Observed:
- Dominant: 315
- Recessive: 101
$$\chi^2 = \frac{(315-312)^2}{312} + \frac{(101-104)^2}{104} = 0.029 + 0.087 = 0.116$$
Degrees of freedom: $k - 1 = 2 - 1 = 1$.
p-value ≈ 0.73.
Conclusion: Observed matches expected. Data consistent with 3:1 ratio.
Assumptions
Chi-square tests assume:
1. Independent observations. One person's response doesn't affect another's.
2. Expected frequencies ≥ 5. If any cell has expected count < 5, chi-square is unreliable. Use Fisher's exact test instead.
3. Random sampling. Sample should be representative of population.
4. Categorical data. Don't use chi-square on continuous data (bin it first, or use a different test).
Effect Size: Cramér's V
Chi-square tells you if variables are associated. But how strongly?
Cramér's V:
$$V = \sqrt{\frac{\chi^2}{n \times (k-1)}}$$
Where:
- $n$ = sample size
- $k$ = min(rows, columns)
- Small effect: $V \approx 0.1$
- Medium: $V \approx 0.3$
- Large: $V \approx 0.5$
Always report V alongside $\chi^2$ and p.
Fisher's Exact Test: When Sample Size Is Small
If expected frequencies are < 5, chi-square is inaccurate.
Fisher's exact test calculates the exact probability of the observed table (and more extreme tables) under independence.
It's computationally intensive but exact. Use it for small samples or rare events.
McNemar's Test: Paired Data
Chi-square assumes independence. But what if you measure the same people twice (before/after)?
McNemar's test handles paired categorical data.
Example: 100 people. Measure opinion before and after an intervention.
| After: Support | After: Oppose | |
|---|---|---|
| Before: Support | 40 | 10 |
| Before: Oppose | 30 | 20 |
McNemar tests if the proportion who changed opinion (10 + 30 = 40) is split evenly.
Further Reading
- Agresti, A. (2018). An Introduction to Categorical Data Analysis (3rd ed.). Wiley.
- Fisher, R. A. (1922). "On the interpretation of χ² from contingency tables, and the calculation of P." Journal of the Royal Statistical Society, 85(1), 87-94.
This is Part 12 of the Statistics series. Next: "Statistics Synthesis."
Part 11 of the Statistics series.
Previous: ANOVA: Comparing Multiple Groups Next: Synthesis: Statistics as the Science of Learning from Data
Comments ()