Significance Levels (α) — Choosing Alpha and What It Means

Hypothesis TestingCore ConceptsFree Lesson

Advertisement

Significance Levels

The significance level (α) is the probability threshold below which we consider evidence against H₀ sufficient to reject it. It is set before the test, not after.


What α = 0.05 Actually Means

If we use α = 0.05 and H₀ is true:

  • We will incorrectly reject H₀ in 5% of repeated experiments
  • 1 in 20 studies will produce a false positive by chance alone
  • It is NOT the probability H₀ is true, nor that the result occurred by chance
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

# Demonstrate false positive rate
np.random.seed(42)
n_experiments = 10000
alpha = 0.05
false_positives = 0

# H₀ is TRUE in all experiments
for _ in range(n_experiments):
    sample = np.random.normal(0, 1, 30)  # true μ = 0 = μ₀
    _, p = stats.ttest_1samp(sample, popmean=0)
    if p < alpha:
        false_positives += 1

print(f"H₀ always true. α = {alpha}")
print(f"False positives: {false_positives}/{n_experiments} = {false_positives/n_experiments:.4f}")
print(f"This should ≈ α = {alpha}")

Conventional Significance Levels by Field

FieldCommon αReason
Social science0.05Fisher's historical convention
Medicine (clinical)0.05 or 0.01Patient safety concerns
Psychology0.05Replication crisis driving to 0.005
Genetics (GWAS)5×10⁻⁸Millions of simultaneous tests
Particle physics0.0000003 (5σ)Extraordinary claims need extraordinary evidence
Quality controlVariesDepends on cost of errors

Choosing α: A Decision Framework

def recommend_alpha(false_positive_cost, false_negative_cost, prior_prob_H0=0.5):
    """Recommend alpha based on relative costs of errors."""
    ratio = false_positive_cost / false_negative_cost
    
    if ratio > 10:
        alpha_rec = 0.01
        rationale = "False positives are very costly → use strict α"
    elif ratio > 3:
        alpha_rec = 0.05
        rationale = "Moderate concern about false positives → standard α"
    elif ratio > 1:
        alpha_rec = 0.10
        rationale = "False negatives relatively costly → relaxed α"
    else:
        alpha_rec = 0.20
        rationale = "False negatives very costly → very relaxed α (screening)"
    
    print(f"FP cost / FN cost = {ratio:.1f}")
    print(f"Recommended α = {alpha_rec}")
    print(f"Rationale: {rationale}")
    return alpha_rec

print("=== Drug with severe side effects ===")
recommend_alpha(false_positive_cost=100, false_negative_cost=5)

print("\n=== Cancer screening test ===")
recommend_alpha(false_positive_cost=1, false_negative_cost=50)

print("\n=== Website A/B test ===")
recommend_alpha(false_positive_cost=2, false_negative_cost=3)

The Multiple Testing Problem

If you run 20 tests all at α = 0.05, you expect about 1 false positive even if all H₀s are true.

# Simulate multiple testing inflation
n_tests = 20
alpha = 0.05
n_simulations = 10000
any_false_positive = 0

for _ in range(n_simulations):
    p_values = [stats.ttest_1samp(np.random.normal(0, 1, 30), 0)[1]
                for _ in range(n_tests)]
    if any(p < alpha for p in p_values):
        any_false_positive += 1

family_wise_error_rate = any_false_positive / n_simulations
expected_fewr = 1 - (1 - alpha)**n_tests

print(f"{n_tests} tests at α={alpha}")
print(f"Simulated FWER: {family_wise_error_rate:.4f}")
print(f"Theoretical FWER: {expected_fewr:.4f}")
print(f"1 − (1 − 0.05)²⁰ = {1-(0.95**20):.4f}")

# Bonferroni correction
bonferroni_alpha = alpha / n_tests
print(f"\nBonferroni-corrected α = {alpha}/{n_tests} = {bonferroni_alpha:.4f}")

Critical Values at Common Significance Levels

# For one-sample z-test (large n)
print("Critical values for z-test:")
print(f"{'α':<8} {'Two-tailed zc':<18} {'One-tailed zc'}")
for alpha in [0.10, 0.05, 0.025, 0.01, 0.005, 0.001]:
    zc_two = stats.norm.ppf(1 - alpha/2)
    zc_one = stats.norm.ppf(1 - alpha)
    print(f"{alpha:<8.3f} ±{zc_two:<17.4f} {zc_one:.4f}")

Key Takeaways

  1. α is set before the experiment — choosing after seeing results is cheating
  2. α = 0.05 is a convention, not a law — choose based on costs of error types
  3. α = the long-run false positive rate under repeated experiments with H₀ true
  4. Multiple testing inflates false positives — always correct (Bonferroni, FDR)
  5. Journals' overemphasis on p < 0.05 has caused the replication crisis
  6. Report your chosen α in your methods section before analysis

Advertisement

Need Expert Statistics Help?

Get personalized tutoring, dissertation support, or statistical consulting.

Advertisement