ANOVA
âšī¸ Why It Matters
When comparing means across three or more groups, running multiple t-tests inflates the Type I error rate. With 3 groups, there are 3 pairwise comparisons, each at Îą = 0.05, giving a family-wise error rate of about 14%. ANOVA (Analysis of Variance) tests whether any group means differ while controlling the overall error rate at Îą. It is the standard method for experiments with multiple treatment conditions, A/B/n testing, and any scenario involving categorical predictors with continuous outcomes.
Overview
ANOVA partitions total variance in the data into between-group variance (differences attributable to the treatment) and within-group variance (random noise within groups). The F-statistic is the ratio of between-group to within-group variance: . A large F indicates that group means are more spread than expected by chance. One-way ANOVA tests means across levels of a single factor. Two-way ANOVA tests two factors and their interaction. ANOVA is an omnibus test â it only tells you that at least one group differs, not which groups differ. Post-hoc tests (Tukey's HSD, Bonferroni) identify specific pairwise differences after a significant F-test.
Key Concepts
F-Statistic (One-Way ANOVA)
Here,
- =Sum of squares between groups: $\sum n_j(\bar{x}_j - \bar{x})^2$
- =Sum of squares within groups: $\sum\sum(x_{ij} - \bar{x}_j)^2$
- =Number of groups
- =Total number of observations
F-Distribution
Here,
- =Numerator degrees of freedom (between groups)
- =Denominator degrees of freedom (within groups)
Effect Size: Eta-Squared
Here,
- =Total sum of squares: $SS_{between} + SS_{within}$
Two-Way ANOVA Model
Here,
- =Main effect of factor A
- =Main effect of factor B
- =Interaction effect between factors A and B
- =Random error term
ANOVA Assumptions
- Independence: Observations are independent within and across groups. Violations (e.g., repeated measures) require different tests.
- Normality: Residuals are approximately normally distributed. Robust to violations with large (CLT).
- Homogeneity of variances: Groups have equal population variances. Test with Levene's test. If violated, use Welch's ANOVA.
Sum of Squares Decomposition
- : Variation due to group differences (explainable)
- : Variation within groups (unexplainable noise)
- : Proportion of total variance explained by groups
Post-Hoc Tests
| Test | When to Use | Controls | Conservativeness |
|---|---|---|---|
| Tukey's HSD | All pairwise comparisons | Family-wise error | Moderate |
| Bonferroni | Few planned comparisons | Family-wise error | Conservative |
| Scheffe | All possible contrasts | Family-wise error | Most conservative |
| Dunnett | Each group vs. control | Family-wise error | Moderate |
Quick Example
đOne-Way ANOVA
Three drugs tested: , , .
Using the F-distribution: . Reject : at least one drug differs.
ANOVA is omnibus â we know something differs, but not what. Post-hoc Tukey HSD identifies which specific pairs differ. Without post-hoc, we cannot claim Drug A is better than Drug B.
đTwo-Way ANOVA
Testing effect of drug (A vs B) and dosage (low vs high) on recovery time. Two-way ANOVA tests three hypotheses simultaneously:
- Does drug type matter? (main effect of drug)
- Does dosage matter? (main effect of dosage)
- Does the effect of drug depend on dosage? (interaction)
If the interaction is significant, the effect of drug depends on dosage level â you cannot interpret main effects in isolation.
đPost-Hoc Analysis
ANOVA with 4 groups yields , . Reject : at least one group differs.
Tukey's HSD reveals: Group A vs C (), Group A vs D (), Group B vs C (), Group B vs D (). Groups C and D don't differ from each other ().
Assumption Checking in Python
from scipy import stats
# Levene's test for equal variances
_, p_levene = stats.levene(group1, group2, group3)
print(f"Levene's test p-value: {p_levene:.3f}")
# Shapiro-Wilk test for normality of residuals
_, p_normal = stats.shapiro(residuals)
print(f"Normality test p-value: {p_normal:.3f}")
If Levene's test is significant (), use Welch's ANOVA or the nonparametric Kruskal-Wallis test instead.
Key Takeaways
đSummary: ANOVA
- Purpose: Compare means across 3+ groups while controlling family-wise Type I error.
- F-statistic: Between-group variance / Within-group variance. Large F â reject .
- Hypotheses: : all group means equal. : at least one differs.
- Post-hoc: ANOVA is omnibus â use Tukey's HSD or Bonferroni to find which groups differ.
- Assumptions: Normality, equal variances, independence. Levene's test checks homoscedasticity.
- Effect Size: = proportion of variance explained by the group factor.
- Relationship to t-test: One-way ANOVA with 2 groups is equivalent to a two-sample t-test ().
- Two-Way ANOVA: Tests main effects of two factors and their interaction term.
- Nonparametric Alternative: Kruskal-Wallis test when ANOVA assumptions are violated.
Deep Dive
For detailed explanations, worked examples, and Python implementations, explore the dedicated statistics lessons:
ANOVA
- One-Way ANOVA â Complete derivation, F-distribution, sum of squares decomposition, post-hoc tests, and Python implementation
Related Tests
- F-Test for Equality of Variances â Testing whether two groups have equal variances using the F-distribution
- Levene's Test â Robust test for homogeneity of variances when normality is questionable
- Two-Sample t-Test â ANOVA with 2 groups is equivalent to a t-test ()
Related Topics
- Hypothesis Testing â The framework underlying ANOVA
- Multiple Testing Problem â Why post-hoc corrections are necessary
- Bonferroni Correction â Controlling family-wise error rate
- Nonparametric Methods â Kruskal-Wallis as the nonparametric alternative to ANOVA
- Effect Size â Measuring practical significance alongside statistical significance