False Discovery Rate (FDR) and Benjamini-Hochberg
The FDR is the expected proportion of rejected H₀s that are actually true (false discoveries):
\text{FDR} = E\left[\frac{\text{# False Rejections}}{\text{# Total Rejections}}\right]
The Benjamini-Hochberg (BH) procedure controls FDR at level q:
Algorithm:
- Sort p-values: p₍₁₎ ≤ p₍₂₎ ≤ ... ≤ p₍ₘ₎
- Find largest k such that p₍ₖ₎ ≤ k·q/m
- Reject all H₀₍ᵢ₎ for i = 1, ..., k
import numpy as np
from statsmodels.stats.multitest import multipletests
import matplotlib.pyplot as plt
np.random.seed(42)
m = 1000 # 1000 tests (e.g., gene expression analysis)
m0 = 800 # 800 truly null genes
m1 = 200 # 200 truly differentially expressed
# Simulate p-values
null_pvals = np.random.uniform(0, 1, m0) # null: uniform
alt_pvals = np.random.beta(0.5, 10, m1) # alternative: skewed toward 0
all_pvals = np.concatenate([null_pvals, alt_pvals])
true_labels = np.array([0]*m0 + [1]*m1) # 0=null, 1=alternative
# Sort for visualization
sort_idx = np.argsort(all_pvals)
sorted_pvals = all_pvals[sort_idx]
ranks = np.arange(1, m+1)
q = 0.05 # desired FDR level
# BH threshold line
bh_line = ranks * q / m
# Find BH cutoff
bh_mask = sorted_pvals <= bh_line
if bh_mask.any():
bh_cutoff = sorted_pvals[bh_mask].max()
n_rejected_bh = bh_mask.sum()
else:
bh_cutoff = 0
n_rejected_bh = 0
# Bonferroni for comparison
bonf_cutoff = 0.05 / m
n_rejected_bonf = (all_pvals <= bonf_cutoff).sum()
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# BH plot
axes[0].scatter(ranks, sorted_pvals, s=2, alpha=0.4, color='steelblue', label='p-values')
axes[0].plot(ranks, bh_line, 'r-', linewidth=2, label=f'BH line (slope=q/m={q/m:.5f})')
if bh_cutoff > 0:
axes[0].axhline(bh_cutoff, color='green', linewidth=2, linestyle='--',
label=f'BH cutoff = {bh_cutoff:.5f}')
axes[0].set_xlabel('Rank')
axes[0].set_ylabel('Sorted p-value')
axes[0].set_title(f'Benjamini-Hochberg Procedure
{n_rejected_bh} rejections at FDR ≤ {q}')
axes[0].legend()
# Histogram of p-values
axes[1].hist(all_pvals, bins=50, edgecolor='black', color='steelblue', alpha=0.7)
axes[1].axvline(bonf_cutoff, color='red', linewidth=2, label=f'Bonferroni ({n_rejected_bonf} rejected)')
axes[1].axvline(bh_cutoff, color='green', linewidth=2, linestyle='--', label=f'BH ({n_rejected_bh} rejected)')
axes[1].set_title('P-value Distribution
(spike near 0 = true alternatives)')
axes[1].legend()
plt.tight_layout()
plt.savefig('bh_procedure.png', dpi=150)
plt.show()
# Using statsmodels
reject_bh, pvals_adj_bh, _, _ = multipletests(all_pvals, alpha=0.05, method='fdr_bh')
reject_bonf, _, _, _ = multipletests(all_pvals, alpha=0.05, method='bonferroni')
print(f"Total tests: {m}, True alternatives: {m1}")
print(f"Bonferroni rejections: {reject_bonf.sum()}")
print(f"BH rejections: {reject_bh.sum()}")
# Estimate false discovery rate achieved
bh_false = np.sum(reject_bh & (true_labels == 0))
print(f"BH actual FDR: {bh_false/max(reject_bh.sum(),1):.4f} (target: {q})")
print(f"BH Power: {np.sum(reject_bh & (true_labels==1))/m1:.4f}")
Key Takeaways
- FDR controls the proportion of false positives among rejections — more lenient than FWER
- BH algorithm is simple: rank p-values, compare to kq/m threshold
- BH is more powerful than Bonferroni — especially for many tests
- Genomics, fMRI, metabolomics — all use FDR because thousands of tests are conducted
- q = 0.05 FDR means you expect 5% of your "discoveries" to be false
- Adjusted p-values from BH are the smallest q for which that test would still be rejected