Chi-Square Tests
βΉοΈ Why It Matters
Most real-world data is categorical: email is spam or not, a patient responds to treatment or not, a user clicks an ad or doesn't. T-tests and ANOVA assume continuous, normally distributed data β they cannot handle categories. Chi-square tests fill this gap by analyzing frequency data in contingency tables and comparing observed distributions to theoretical ones. In machine learning, chi-square tests drive feature selection β identifying which categorical features are most associated with the target variable before training a classifier.
Overview
Chi-square tests address two fundamental questions about categorical data. The goodness of fit test determines whether a single categorical variable follows a specified distribution (e.g., is a die fair?). The test of independence determines whether two categorical variables are associated (e.g., is gender associated with voting preference?). Both use the same test statistic β the sum of squared differences between observed and expected frequencies, standardized by expected frequencies. When expected frequencies are small (), the chi-square approximation breaks down and Fisher's exact test should be used instead. Effect size is measured by CramΓ©r's V.
Key Concepts
Chi-Square Goodness of Fit
Here,
- =Observed frequency in category i
- =Expected frequency under Hβ: $n \cdot p_i$
- =Number of categories
Expected Frequency (Independence)
Here,
- =Marginal row total for row i
- =Marginal column total for column j
- =Grand total of all observations
Chi-Square Test of Independence
Here,
- =$(r-1)(c-1)$ degrees of freedom
CramΓ©r's V (Effect Size)
Here,
- =$\min(r, c)$ β the smaller of rows or columns
- =Ranges from 0 (no association) to 1 (perfect association)
Fisher's Exact Test (2Γ2)
Here,
- =Cell counts in the 2Γ2 table
- =Grand total
Degrees of Freedom
| Test | Formula | Example |
|---|---|---|
| Goodness of Fit (known ) | 6-sided die: | |
| Goodness of Fit (estimated params) | 10 bins, estimating ΞΌ and Ο: | |
| Independence | table: |
Minimum Expected Frequency Rule
- No expected frequency should be less than 1
- No more than 20% of expected frequencies should be less than 5
- For tables violating this, use Fisher's exact test
CramΓ©r's V Benchmarks
| Small | Medium | Large | |
|---|---|---|---|
| 1 | 0.10 | 0.30 | 0.50 |
| 2 | 0.07 | 0.21 | 0.35 |
| 3+ | 0.06 | 0.17 | 0.29 |
Quick Example
πGoodness of Fit: Is the Die Fair?
A die rolled 120 times: [18, 15, 22, 17, 20, 28]. Expected: 20 per face.
, critical value . Since , fail to reject . The die appears fair ().
πTest of Independence
Gender vs. voting preference (300 voters). After computing expected frequencies and the chi-square statistic: , , .
Reject : gender and voting preference are significantly associated. The largest contributor is the "Other gender / Party C" cell ().
Key Takeaways
πSummary: Chi-Square Tests
- Goodness of Fit: Tests whether a single categorical variable matches an expected distribution. .
- Test of Independence: Tests whether two categorical variables are associated. .
- Expected Frequencies: Must be in at least 80% of cells. Use Fisher's exact test when this fails.
- Effect Size: Always report CramΓ©r's V alongside the p-value β large samples make everything "significant."
- Feature Selection: Chi-square independence test is the standard filter method for categorical features in NLP and spam detection.
- Assumptions: Categorical data, independent observations, random sampling, sufficient expected frequencies.
- Ordinal Data: Chi-square ignores ordering. Use trend tests (Cochran-Armitage) for more power with ordinal data.
Deep Dive
For detailed explanations, worked examples, and Python implementations, explore the dedicated statistics lessons:
Chi-Square Distribution
- Chi-Square Distribution β PDF, properties, relationship to Normal, critical values, and simulation
Goodness of Fit
- Chi-Square Goodness of Fit β Testing whether data matches a specified distribution with full worked examples
Test of Independence
- Chi-Square Test of Independence β Contingency tables, expected frequencies, post-hoc analysis, and standardized residuals
Related Tests
- F-Test for Equality of Variances β Comparing variances of two groups using the F-distribution
- Levene's Test β Robust alternative to F-test for equality of variances
Related Topics
- Hypothesis Testing β The framework underlying chi-square tests
- Nonparametric Methods β Distribution-free alternatives when assumptions are violated
- Logistic Regression β Models the same associations chi-square detects, with continuous predictors