Bonferroni Correction
The simplest correction for multiple testing: divide α by the number of tests.
Or equivalently: multiply each p-value by m and compare to original α.
import numpy as np
from scipy import stats
from statsmodels.stats.multitest import multipletests
np.random.seed(42)
# Post-hoc comparisons after one-way ANOVA (3 groups → 3 pairwise tests)
g1 = np.random.normal(50, 8, 30)
g2 = np.random.normal(55, 8, 30)
g3 = np.random.normal(58, 8, 30)
# All pairwise t-tests
pairs = [('G1 vs G2', g1, g2), ('G1 vs G3', g1, g3), ('G2 vs G3', g2, g3)]
p_values = []
for name, a, b in pairs:
_, p = stats.ttest_ind(a, b)
p_values.append(p)
print(f"{name}: p = {p:.4f}")
# Bonferroni correction
m = len(p_values)
bonf_alpha = 0.05 / m
print(f"
Bonferroni α = 0.05/{m} = {bonf_alpha:.4f}")
for (name, _, _), p in zip(pairs, p_values):
print(f"{name}: p={p:.4f} {'<' if p < bonf_alpha else '≥'} {bonf_alpha:.4f} → {'Reject' if p < bonf_alpha else 'Fail to reject'}")
# Compare methods
print("
--- Method Comparison ---")
for method in ['bonferroni', 'holm', 'holm-sidak', 'fdr_bh']:
reject, pvals_corr, _, _ = multipletests(p_values, alpha=0.05, method=method)
print(f"{method:15}: reject = {list(reject)}, corrected p = {[f'{p:.4f}' for p in pvals_corr]}")
When Bonferroni Is Too Conservative
Bonferroni assumes independence between tests. When tests are correlated, it's overly conservative (too many false negatives).
# Bonferroni is too conservative when tests are correlated
# Example: testing many highly correlated features
# Effective number of tests (Meff) for correlated tests
# Can use PCA-based methods to estimate Meff < m
correlation_matrix = np.array([[1.0, 0.8, 0.7],
[0.8, 1.0, 0.9],
[0.7, 0.9, 1.0]])
eigenvalues = np.linalg.eigvalsh(correlation_matrix)
m_eff = sum(min(1, e) for e in eigenvalues) # Meff (Galwey method)
print(f"
For highly correlated tests (r≈0.8):")
print(f"m = 3 tests, but effective m ≈ {m_eff:.2f}")
print(f"More appropriate Bonferroni α = 0.05/{m_eff:.2f} = {0.05/m_eff:.4f}")
print(f"(vs strict Bonferroni α = 0.05/3 = {0.05/3:.4f})")
Key Takeaways
- Bonferroni: α_corrected = α/m — multiply p-values by m
- Conservative — best when any false positive is catastrophic (drug safety)
- Holm's stepdown is uniformly more powerful than Bonferroni
- Too conservative when tests are correlated — use Šidák or matrix methods
- FDR methods (Benjamini-Hochberg) are preferred for exploratory research with many tests