Type I and Type II Errors
In hypothesis testing, two types of mistakes are possible. Understanding them is essential for designing studies and interpreting results.
The Decision Matrix
| H₀ True | H₀ False | |
|---|---|---|
| Reject H₀ | ❌ Type I Error (α) | ✅ Correct (Power = 1−β) |
| Fail to Reject H₀ | ✅ Correct (1−α) | ❌ Type II Error (β) |
Type I Error (α) — False Positive
Rejecting H₀ when it is actually true.
- Probability = α (significance level) — set by the researcher
- Example: Concluding a drug works when it actually doesn't
- "Crying wolf" — alarming when there's no cause
Type II Error (β) — False Negative
Failing to reject H₀ when H₁ is actually true.
- Probability = β — depends on effect size, sample size, α
- Example: Concluding a drug doesn't work when it actually does
- "Missing the signal" — failing to detect a real effect
Power = 1 − β
The probability of correctly detecting a true effect.
Real-World Consequences
| Field | Type I Error | Type II Error |
|---|---|---|
| Medicine | Approve useless drug | Miss effective drug |
| Criminal justice | Convict innocent person | Acquit guilty person |
| Quality control | Reject good batch | Ship defective batch |
| Spam filter | Block legitimate email | Allow spam through |
| Security | False alarm | Miss intrusion |
Python Simulation
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
np.random.seed(42)
def simulate_errors(n_simulations=10000, n=30, alpha=0.05,
true_mean_H0=0, true_mean_H1=0.5, sigma=1):
"""Simulate Type I and II errors."""
type_i_errors = 0
type_ii_errors = 0
for _ in range(n_simulations // 2):
# H₀ is TRUE (true mean = 0)
sample = np.random.normal(true_mean_H0, sigma, n)
_, p = stats.ttest_1samp(sample, popmean=0)
if p < alpha:
type_i_errors += 1 # Falsely rejected H₀
# H₀ is FALSE (true mean = 0.5, effect exists)
sample = np.random.normal(true_mean_H1, sigma, n)
_, p = stats.ttest_1samp(sample, popmean=0)
if p >= alpha:
type_ii_errors += 1 # Failed to detect effect
n_half = n_simulations // 2
type_i_rate = type_i_errors / n_half
type_ii_rate = type_ii_errors / n_half
power = 1 - type_ii_rate
return type_i_rate, type_ii_rate, power
# Base case
ti, tii, pw = simulate_errors(n=30, alpha=0.05, true_mean_H1=0.5)
print(f"n=30, α=0.05, effect=0.5σ:")
print(f" Type I rate (α): {ti:.4f} (nominal: 0.05)")
print(f" Type II rate (β): {tii:.4f}")
print(f" Power (1-β): {pw:.4f}")
# Show the tradeoff: reducing α increases β
print("\n--- α vs β tradeoff (n=30, effect=0.5σ) ---")
for a in [0.10, 0.05, 0.01, 0.001]:
ti, tii, pw = simulate_errors(n=30, alpha=a, true_mean_H1=0.5)
print(f"α={a:.3f}: β={tii:.3f}, Power={pw:.3f}")
# Show effect of sample size on power
print("\n--- Sample size vs Power (α=0.05, effect=0.5σ) ---")
for n in [10, 20, 30, 50, 100, 200]:
ti, tii, pw = simulate_errors(n=n, alpha=0.05, true_mean_H1=0.5)
bar = "█" * int(pw * 20)
print(f"n={n:4d}: Power={pw:.3f} {bar}")
Visualizing the Two Error Types
fig, ax = plt.subplots(figsize=(12, 6))
x = np.linspace(-4, 6, 1000)
mu0, mu1, sigma = 0, 2, 1
n = 25
se = sigma / np.sqrt(n)
# Distributions of the test statistic
dist_H0 = stats.norm(loc=mu0, scale=se)
dist_H1 = stats.norm(loc=mu1, scale=se)
# Critical value for α=0.05 (right-tailed)
alpha = 0.05
cv = dist_H0.ppf(1 - alpha)
y0 = dist_H0.pdf(x)
y1 = dist_H1.pdf(x)
ax.plot(x, y0, 'b-', linewidth=2, label=f'H₀: μ={mu0}')
ax.plot(x, y1, 'r-', linewidth=2, label=f'H₁: μ={mu1}')
ax.axvline(cv, color='black', linestyle='--', linewidth=2, label=f'Critical value = {cv:.2f}')
# Type I error (α)
x_ti = x[x >= cv]
ax.fill_between(x_ti, dist_H0.pdf(x_ti), alpha=0.3, color='blue', label=f'Type I Error (α={alpha:.2f})')
# Type II error (β)
x_tii = x[x < cv]
beta = dist_H1.cdf(cv)
ax.fill_between(x_tii, dist_H1.pdf(x_tii), alpha=0.3, color='red', label=f'Type II Error (β={beta:.2f})')
# Power
x_pw = x[x >= cv]
ax.fill_between(x_pw, dist_H1.pdf(x_pw), alpha=0.3, color='green', label=f'Power = {1-beta:.2f}')
ax.set_xlabel('Test Statistic')
ax.set_ylabel('Density')
ax.set_title(f'Type I & II Errors (n={n}, effect size = {mu1-mu0}σ, α={alpha})')
ax.legend(loc='upper right')
plt.tight_layout()
plt.savefig('type_i_ii_errors.png', dpi=150)
plt.show()
Controlling Error Rates
| Goal | Strategy |
|---|---|
| Reduce Type I (α) | Use smaller α (0.01 vs 0.05) |
| Reduce Type II (β) | Increase n, use larger α, increase effect size |
| Increase power | Larger n, larger effect, smaller σ, one-tailed test |
| Balance both | Pre-study power analysis |
Key Takeaways
- Type I error (α): reject H₀ when true — false positive, set by researcher
- Type II error (β): fail to reject H₀ when false — false negative
- Power = 1 − β — probability of detecting a real effect
- Reducing α increases β — there is always a tradeoff
- Increasing sample size reduces both α (achieved) and β simultaneously
- The consequences of each error type should guide α — in medicine, Type I can harm patients