Significance Levels
The significance level (α) is the probability threshold below which we consider evidence against H₀ sufficient to reject it. It is set before the test, not after.
What α = 0.05 Actually Means
If we use α = 0.05 and H₀ is true:
- We will incorrectly reject H₀ in 5% of repeated experiments
- 1 in 20 studies will produce a false positive by chance alone
- It is NOT the probability H₀ is true, nor that the result occurred by chance
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
# Demonstrate false positive rate
np.random.seed(42)
n_experiments = 10000
alpha = 0.05
false_positives = 0
# H₀ is TRUE in all experiments
for _ in range(n_experiments):
sample = np.random.normal(0, 1, 30) # true μ = 0 = μ₀
_, p = stats.ttest_1samp(sample, popmean=0)
if p < alpha:
false_positives += 1
print(f"H₀ always true. α = {alpha}")
print(f"False positives: {false_positives}/{n_experiments} = {false_positives/n_experiments:.4f}")
print(f"This should ≈ α = {alpha}")
Conventional Significance Levels by Field
| Field | Common α | Reason |
|---|---|---|
| Social science | 0.05 | Fisher's historical convention |
| Medicine (clinical) | 0.05 or 0.01 | Patient safety concerns |
| Psychology | 0.05 | Replication crisis driving to 0.005 |
| Genetics (GWAS) | 5×10⁻⁸ | Millions of simultaneous tests |
| Particle physics | 0.0000003 (5σ) | Extraordinary claims need extraordinary evidence |
| Quality control | Varies | Depends on cost of errors |
Choosing α: A Decision Framework
def recommend_alpha(false_positive_cost, false_negative_cost, prior_prob_H0=0.5):
"""Recommend alpha based on relative costs of errors."""
ratio = false_positive_cost / false_negative_cost
if ratio > 10:
alpha_rec = 0.01
rationale = "False positives are very costly → use strict α"
elif ratio > 3:
alpha_rec = 0.05
rationale = "Moderate concern about false positives → standard α"
elif ratio > 1:
alpha_rec = 0.10
rationale = "False negatives relatively costly → relaxed α"
else:
alpha_rec = 0.20
rationale = "False negatives very costly → very relaxed α (screening)"
print(f"FP cost / FN cost = {ratio:.1f}")
print(f"Recommended α = {alpha_rec}")
print(f"Rationale: {rationale}")
return alpha_rec
print("=== Drug with severe side effects ===")
recommend_alpha(false_positive_cost=100, false_negative_cost=5)
print("\n=== Cancer screening test ===")
recommend_alpha(false_positive_cost=1, false_negative_cost=50)
print("\n=== Website A/B test ===")
recommend_alpha(false_positive_cost=2, false_negative_cost=3)
The Multiple Testing Problem
If you run 20 tests all at α = 0.05, you expect about 1 false positive even if all H₀s are true.
# Simulate multiple testing inflation
n_tests = 20
alpha = 0.05
n_simulations = 10000
any_false_positive = 0
for _ in range(n_simulations):
p_values = [stats.ttest_1samp(np.random.normal(0, 1, 30), 0)[1]
for _ in range(n_tests)]
if any(p < alpha for p in p_values):
any_false_positive += 1
family_wise_error_rate = any_false_positive / n_simulations
expected_fewr = 1 - (1 - alpha)**n_tests
print(f"{n_tests} tests at α={alpha}")
print(f"Simulated FWER: {family_wise_error_rate:.4f}")
print(f"Theoretical FWER: {expected_fewr:.4f}")
print(f"1 − (1 − 0.05)²⁰ = {1-(0.95**20):.4f}")
# Bonferroni correction
bonferroni_alpha = alpha / n_tests
print(f"\nBonferroni-corrected α = {alpha}/{n_tests} = {bonferroni_alpha:.4f}")
Critical Values at Common Significance Levels
# For one-sample z-test (large n)
print("Critical values for z-test:")
print(f"{'α':<8} {'Two-tailed zc':<18} {'One-tailed zc'}")
for alpha in [0.10, 0.05, 0.025, 0.01, 0.005, 0.001]:
zc_two = stats.norm.ppf(1 - alpha/2)
zc_one = stats.norm.ppf(1 - alpha)
print(f"{alpha:<8.3f} ±{zc_two:<17.4f} {zc_one:.4f}")
Key Takeaways
- α is set before the experiment — choosing after seeing results is cheating
- α = 0.05 is a convention, not a law — choose based on costs of error types
- α = the long-run false positive rate under repeated experiments with H₀ true
- Multiple testing inflates false positives — always correct (Bonferroni, FDR)
- Journals' overemphasis on p < 0.05 has caused the replication crisis
- Report your chosen α in your methods section before analysis