Chi-Square Tests

ℹ️ Why It Matters

Most real-world data is categorical: email is spam or not, a patient responds to treatment or not, a user clicks an ad or doesn't. T-tests and ANOVA assume continuous, normally distributed data — they cannot handle categories. Chi-square tests fill this gap by analyzing frequency data in contingency tables and comparing observed distributions to theoretical ones. In machine learning, chi-square tests drive feature selection — identifying which categorical features are most associated with the target variable before training a classifier.

Overview

Chi-square tests address two fundamental questions about categorical data. The goodness of fit test determines whether a single categorical variable follows a specified distribution (e.g., is a die fair?). The test of independence determines whether two categorical variables are associated (e.g., is gender associated with voting preference?). Both use the same test statistic — the sum of squared differences between observed and expected frequencies, standardized by expected frequencies. When expected frequencies are small ( $< 5$ ), the chi-square approximation breaks down and Fisher's exact test should be used instead. Effect size is measured by Cramér's V.

Key Concepts

Chi-Square Goodness of Fit

\chi^2 = \sum_{i=1}^{k} \frac{(O_i - E_i)^2}{E_i}

Here,

$O_i$ =Observed frequency in category i
$E_i$ =Expected frequency under H₀: $n \cdot p_i$
$k$ =Number of categories

Expected Frequency (Independence)

E_{ij} = \frac{R_i \cdot C_j}{n}

Here,

$R_i$ =Marginal row total for row i
$C_j$ =Marginal column total for column j
$n$ =Grand total of all observations

Chi-Square Test of Independence

\chi^2 = \sum_{i=1}^{r} \sum_{j=1}^{c} \frac{(O_{ij} - E_{ij})^2}{E_{ij}}

Here,

$df$ =$(r-1)(c-1)$ degrees of freedom

Cramér's V (Effect Size)

V = \sqrt{\frac{\chi^2}{n \cdot (k^* - 1)}}

Here,

$k^*$ =$\min(r, c)$ — the smaller of rows or columns
$V$ =Ranges from 0 (no association) to 1 (perfect association)

Fisher's Exact Test (2×2)

P = \frac{\binom{a+b}{a}\binom{c+d}{c}}{\binom{n}{a+c}}

Here,

$a, b, c, d$ =Cell counts in the 2×2 table
$n$ =Grand total

Degrees of Freedom

Test	Formula	Example
Goodness of Fit (known $p_i$ )	$k - 1$	6-sided die: $df = 5$
Goodness of Fit (estimated params)	$k - 1 - m$	10 bins, estimating μ and σ: $df = 7$
Independence $r \times c$	$(r-1)(c-1)$	$3 \times 4$ table: $df = 6$

Minimum Expected Frequency Rule

No expected frequency should be less than 1
No more than 20% of expected frequencies should be less than 5
For $2 \times 2$ tables violating this, use Fisher's exact test

Cramér's V Benchmarks

$df^*$	Small	Medium	Large
1	0.10	0.30	0.50
2	0.07	0.21	0.35
3+	0.06	0.17	0.29

Quick Example

📝Goodness of Fit: Is the Die Fair?

A die rolled 120 times: [18, 15, 22, 17, 20, 28]. Expected: 20 per face.

\chi^2 = \frac{(18-20)^2}{20} + \frac{(15-20)^2}{20} + \cdots + \frac{(28-20)^2}{20} = 5.30

$df = 5$ , critical value $\chi^2_{0.05, 5} = 11.07$ . Since $5.30 < 11.07$ , fail to reject $H_0$ . The die appears fair ( $p = 0.38$ ).

📝Test of Independence

Gender vs. voting preference (300 voters). After computing expected frequencies and the chi-square statistic: $\chi^2 = 36.73$ , $df = 4$ , $p < 0.001$ .

Reject $H_0$ : gender and voting preference are significantly associated. The largest contributor is the "Other gender / Party C" cell ( $\chi^2_{contribution} = 20.8$ ).

Key Takeaways

📋Summary: Chi-Square Tests

Goodness of Fit: Tests whether a single categorical variable matches an expected distribution. $df = k - 1$ .
Test of Independence: Tests whether two categorical variables are associated. $df = (r-1)(c-1)$ .
Expected Frequencies: Must be $\geq 5$ in at least 80% of cells. Use Fisher's exact test when this fails.
Effect Size: Always report Cramér's V alongside the p-value — large samples make everything "significant."
Feature Selection: Chi-square independence test is the standard filter method for categorical features in NLP and spam detection.
Assumptions: Categorical data, independent observations, random sampling, sufficient expected frequencies.
Ordinal Data: Chi-square ignores ordering. Use trend tests (Cochran-Armitage) for more power with ordinal data.

Deep Dive

For detailed explanations, worked examples, and Python implementations, explore the dedicated statistics lessons:

Chi-Square Distribution

Chi-Square Distribution — PDF, properties, relationship to Normal, critical values, and simulation

Goodness of Fit

Chi-Square Goodness of Fit — Testing whether data matches a specified distribution with full worked examples

Test of Independence

Chi-Square Test of Independence — Contingency tables, expected frequencies, post-hoc analysis, and standardized residuals

Related Tests

F-Test for Equality of Variances — Comparing variances of two groups using the F-distribution
Levene's Test — Robust alternative to F-test for equality of variances

Chi-Square Tests

Chi-Square Tests

Overview

Key Concepts

Chi-Square Goodness of Fit

Expected Frequency (Independence)

Chi-Square Test of Independence

Cramér's V (Effect Size)

Fisher's Exact Test (2×2)

Degrees of Freedom

Minimum Expected Frequency Rule

Cramér's V Benchmarks

Quick Example

📝Goodness of Fit: Is the Die Fair?

📝Test of Independence

Key Takeaways

📋Summary: Chi-Square Tests

Deep Dive

Chi-Square Distribution

Goodness of Fit

Test of Independence

Related Tests

Related Topics