← Math|51 of 100
Statistics

ANOVA

Master one-way and two-way ANOVA for comparing multiple group means.

📂 Comparing Groups📖 Lesson 51 of 100🎓 Free Course

Advertisement

ANOVA

â„šī¸ Why It Matters

When comparing means across three or more groups, running multiple t-tests inflates the Type I error rate. With 3 groups, there are 3 pairwise comparisons, each at Îą = 0.05, giving a family-wise error rate of about 14%. ANOVA (Analysis of Variance) tests whether any group means differ while controlling the overall error rate at Îą. It is the standard method for experiments with multiple treatment conditions, A/B/n testing, and any scenario involving categorical predictors with continuous outcomes.


Overview

ANOVA partitions total variance in the data into between-group variance (differences attributable to the treatment) and within-group variance (random noise within groups). The F-statistic is the ratio of between-group to within-group variance: F=MSbetween/MSwithinF = MS_{between} / MS_{within}. A large F indicates that group means are more spread than expected by chance. One-way ANOVA tests means across levels of a single factor. Two-way ANOVA tests two factors and their interaction. ANOVA is an omnibus test — it only tells you that at least one group differs, not which groups differ. Post-hoc tests (Tukey's HSD, Bonferroni) identify specific pairwise differences after a significant F-test.


Key Concepts

F-Statistic (One-Way ANOVA)

F=MSbetweenMSwithin=SSbetween/(k−1)SSwithin/(N−k)F = \frac{\text{MS}_{between}}{\text{MS}_{within}} = \frac{SS_{between} / (k-1)}{SS_{within} / (N-k)}

Here,

  • SSbetweenSS_{between}=Sum of squares between groups: $\sum n_j(\bar{x}_j - \bar{x})^2$
  • SSwithinSS_{within}=Sum of squares within groups: $\sum\sum(x_{ij} - \bar{x}_j)^2$
  • kk=Number of groups
  • NN=Total number of observations

F-Distribution

FâˆŧFk−1,N−kF \sim F_{k-1, N-k}

Here,

  • k−1k-1=Numerator degrees of freedom (between groups)
  • N−kN-k=Denominator degrees of freedom (within groups)

Effect Size: Eta-Squared

Ρ2=SSbetweenSStotal\eta^2 = \frac{SS_{between}}{SS_{total}}

Here,

  • SStotalSS_{total}=Total sum of squares: $SS_{between} + SS_{within}$

Two-Way ANOVA Model

yijk=Îŧ+Îąi+βj+(ιβ)ij+Īĩijky_{ijk} = \mu + \alpha_i + \beta_j + (\alpha\beta)_{ij} + \epsilon_{ijk}

Here,

  • Îąi\alpha_i=Main effect of factor A
  • βj\beta_j=Main effect of factor B
  • (ιβ)ij(\alpha\beta)_{ij}=Interaction effect between factors A and B
  • Īĩijk\epsilon_{ijk}=Random error term

ANOVA Assumptions

  1. Independence: Observations are independent within and across groups. Violations (e.g., repeated measures) require different tests.
  2. Normality: Residuals are approximately normally distributed. Robust to violations with large nn (CLT).
  3. Homogeneity of variances: Groups have equal population variances. Test with Levene's test. If violated, use Welch's ANOVA.

Sum of Squares Decomposition

SStotal=SSbetween+SSwithinSS_{total} = SS_{between} + SS_{within}
  • SSbetweenSS_{between}: Variation due to group differences (explainable)
  • SSwithinSS_{within}: Variation within groups (unexplainable noise)
  • Ρ2=SSbetween/SStotal\eta^2 = SS_{between} / SS_{total}: Proportion of total variance explained by groups

Post-Hoc Tests

TestWhen to UseControlsConservativeness
Tukey's HSDAll pairwise comparisonsFamily-wise errorModerate
BonferroniFew planned comparisonsFamily-wise errorConservative
ScheffeAll possible contrastsFamily-wise errorMost conservative
DunnettEach group vs. controlFamily-wise errorModerate

Quick Example

📝One-Way ANOVA

Three drugs tested: F=4.5F = 4.5, df1=2df_1 = 2, df2=87df_2 = 87.

Using the F-distribution: p≈0.013<0.05p \approx 0.013 < 0.05. Reject H0H_0: at least one drug differs.

ANOVA is omnibus — we know something differs, but not what. Post-hoc Tukey HSD identifies which specific pairs differ. Without post-hoc, we cannot claim Drug A is better than Drug B.

📝Two-Way ANOVA

Testing effect of drug (A vs B) and dosage (low vs high) on recovery time. Two-way ANOVA tests three hypotheses simultaneously:

  1. Does drug type matter? (main effect of drug)
  2. Does dosage matter? (main effect of dosage)
  3. Does the effect of drug depend on dosage? (interaction)

If the interaction is significant, the effect of drug depends on dosage level — you cannot interpret main effects in isolation.

📝Post-Hoc Analysis

ANOVA with 4 groups yields F=5.2F = 5.2, p=0.003p = 0.003. Reject H0H_0: at least one group differs.

Tukey's HSD reveals: Group A vs C (p=0.001p = 0.001), Group A vs D (p=0.02p = 0.02), Group B vs C (p=0.004p = 0.004), Group B vs D (p=0.03p = 0.03). Groups C and D don't differ from each other (p=0.91p = 0.91).

Assumption Checking in Python

from scipy import stats

# Levene's test for equal variances
_, p_levene = stats.levene(group1, group2, group3)
print(f"Levene's test p-value: {p_levene:.3f}")

# Shapiro-Wilk test for normality of residuals
_, p_normal = stats.shapiro(residuals)
print(f"Normality test p-value: {p_normal:.3f}")

If Levene's test is significant (p<0.05p < 0.05), use Welch's ANOVA or the nonparametric Kruskal-Wallis test instead.


Key Takeaways

📋Summary: ANOVA

  • Purpose: Compare means across 3+ groups while controlling family-wise Type I error.
  • F-statistic: Between-group variance / Within-group variance. Large F → reject H0H_0.
  • Hypotheses: H0H_0: all group means equal. H1H_1: at least one differs.
  • Post-hoc: ANOVA is omnibus — use Tukey's HSD or Bonferroni to find which groups differ.
  • Assumptions: Normality, equal variances, independence. Levene's test checks homoscedasticity.
  • Effect Size: Ρ2\eta^2 = proportion of variance explained by the group factor.
  • Relationship to t-test: One-way ANOVA with 2 groups is equivalent to a two-sample t-test (F=t2F = t^2).
  • Two-Way ANOVA: Tests main effects of two factors and their interaction term.
  • Nonparametric Alternative: Kruskal-Wallis test when ANOVA assumptions are violated.

Deep Dive

For detailed explanations, worked examples, and Python implementations, explore the dedicated statistics lessons:

ANOVA

  • One-Way ANOVA — Complete derivation, F-distribution, sum of squares decomposition, post-hoc tests, and Python implementation

Related Tests

Related Topics

Lesson Progress51 / 100