Type I and Type II Errors — False Positives, False Negatives, Power

Hypothesis TestingError AnalysisFree Lesson

Advertisement

Type I and Type II Errors

In hypothesis testing, two types of mistakes are possible. Understanding them is essential for designing studies and interpreting results.


The Decision Matrix

H₀ TrueH₀ False
Reject H₀❌ Type I Error (α)✅ Correct (Power = 1−β)
Fail to Reject H₀✅ Correct (1−α)❌ Type II Error (β)

Type I Error (α) — False Positive

Rejecting H₀ when it is actually true.

  • Probability = α (significance level) — set by the researcher
  • Example: Concluding a drug works when it actually doesn't
  • "Crying wolf" — alarming when there's no cause

Type II Error (β) — False Negative

Failing to reject H₀ when H₁ is actually true.

  • Probability = β — depends on effect size, sample size, α
  • Example: Concluding a drug doesn't work when it actually does
  • "Missing the signal" — failing to detect a real effect

Power = 1 − β

The probability of correctly detecting a true effect.


Real-World Consequences

FieldType I ErrorType II Error
MedicineApprove useless drugMiss effective drug
Criminal justiceConvict innocent personAcquit guilty person
Quality controlReject good batchShip defective batch
Spam filterBlock legitimate emailAllow spam through
SecurityFalse alarmMiss intrusion

Python Simulation

import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

np.random.seed(42)

def simulate_errors(n_simulations=10000, n=30, alpha=0.05,
                     true_mean_H0=0, true_mean_H1=0.5, sigma=1):
    """Simulate Type I and II errors."""
    type_i_errors = 0
    type_ii_errors = 0
    
    for _ in range(n_simulations // 2):
        # H₀ is TRUE (true mean = 0)
        sample = np.random.normal(true_mean_H0, sigma, n)
        _, p = stats.ttest_1samp(sample, popmean=0)
        if p < alpha:
            type_i_errors += 1   # Falsely rejected H₀
        
        # H₀ is FALSE (true mean = 0.5, effect exists)
        sample = np.random.normal(true_mean_H1, sigma, n)
        _, p = stats.ttest_1samp(sample, popmean=0)
        if p >= alpha:
            type_ii_errors += 1  # Failed to detect effect
    
    n_half = n_simulations // 2
    type_i_rate = type_i_errors / n_half
    type_ii_rate = type_ii_errors / n_half
    power = 1 - type_ii_rate
    
    return type_i_rate, type_ii_rate, power

# Base case
ti, tii, pw = simulate_errors(n=30, alpha=0.05, true_mean_H1=0.5)
print(f"n=30, α=0.05, effect=0.5σ:")
print(f"  Type I rate (α): {ti:.4f} (nominal: 0.05)")
print(f"  Type II rate (β): {tii:.4f}")
print(f"  Power (1-β): {pw:.4f}")

# Show the tradeoff: reducing α increases β
print("\n--- α vs β tradeoff (n=30, effect=0.5σ) ---")
for a in [0.10, 0.05, 0.01, 0.001]:
    ti, tii, pw = simulate_errors(n=30, alpha=a, true_mean_H1=0.5)
    print(f"α={a:.3f}: β={tii:.3f}, Power={pw:.3f}")

# Show effect of sample size on power
print("\n--- Sample size vs Power (α=0.05, effect=0.5σ) ---")
for n in [10, 20, 30, 50, 100, 200]:
    ti, tii, pw = simulate_errors(n=n, alpha=0.05, true_mean_H1=0.5)
    bar = "█" * int(pw * 20)
    print(f"n={n:4d}: Power={pw:.3f} {bar}")

Visualizing the Two Error Types

fig, ax = plt.subplots(figsize=(12, 6))

x = np.linspace(-4, 6, 1000)
mu0, mu1, sigma = 0, 2, 1
n = 25
se = sigma / np.sqrt(n)

# Distributions of the test statistic
dist_H0 = stats.norm(loc=mu0, scale=se)
dist_H1 = stats.norm(loc=mu1, scale=se)

# Critical value for α=0.05 (right-tailed)
alpha = 0.05
cv = dist_H0.ppf(1 - alpha)

y0 = dist_H0.pdf(x)
y1 = dist_H1.pdf(x)

ax.plot(x, y0, 'b-', linewidth=2, label=f'H₀: μ={mu0}')
ax.plot(x, y1, 'r-', linewidth=2, label=f'H₁: μ={mu1}')
ax.axvline(cv, color='black', linestyle='--', linewidth=2, label=f'Critical value = {cv:.2f}')

# Type I error (α)
x_ti = x[x >= cv]
ax.fill_between(x_ti, dist_H0.pdf(x_ti), alpha=0.3, color='blue', label=f'Type I Error (α={alpha:.2f})')

# Type II error (β)
x_tii = x[x < cv]
beta = dist_H1.cdf(cv)
ax.fill_between(x_tii, dist_H1.pdf(x_tii), alpha=0.3, color='red', label=f'Type II Error (β={beta:.2f})')

# Power
x_pw = x[x >= cv]
ax.fill_between(x_pw, dist_H1.pdf(x_pw), alpha=0.3, color='green', label=f'Power = {1-beta:.2f}')

ax.set_xlabel('Test Statistic')
ax.set_ylabel('Density')
ax.set_title(f'Type I & II Errors (n={n}, effect size = {mu1-mu0}σ, α={alpha})')
ax.legend(loc='upper right')
plt.tight_layout()
plt.savefig('type_i_ii_errors.png', dpi=150)
plt.show()

Controlling Error Rates

GoalStrategy
Reduce Type I (α)Use smaller α (0.01 vs 0.05)
Reduce Type II (β)Increase n, use larger α, increase effect size
Increase powerLarger n, larger effect, smaller σ, one-tailed test
Balance bothPre-study power analysis

Key Takeaways

  1. Type I error (α): reject H₀ when true — false positive, set by researcher
  2. Type II error (β): fail to reject H₀ when false — false negative
  3. Power = 1 − β — probability of detecting a real effect
  4. Reducing α increases β — there is always a tradeoff
  5. Increasing sample size reduces both α (achieved) and β simultaneously
  6. The consequences of each error type should guide α — in medicine, Type I can harm patients

Advertisement

Need Expert Statistics Help?

Get personalized tutoring, dissertation support, or statistical consulting.

Advertisement