Heteroscedasticity
Heteroscedasticity means the variance of the error term is not constant across observations. It violates the homoscedasticity assumption of OLS.
import numpy as np
import statsmodels.api as sm
import matplotlib.pyplot as plt
from statsmodels.stats.diagnostic import het_breuschpagan, het_white
from statsmodels.stats.sandwich_covariance import cov_hc3
np.random.seed(42)
n = 200
X = np.random.uniform(1, 10, n)
X_dm = sm.add_constant(X)
# Heteroscedastic errors: variance grows with X
y_hetero = 2 + 3*X + np.random.normal(0, 0.5*X, n)
# Homoscedastic errors (for comparison)
y_homo = 2 + 3*X + np.random.normal(0, 2, n)
fig, axes = plt.subplots(1, 2, figsize=(12, 4))
for ax, y, label in [(axes[0], y_homo, 'Homoscedastic'),
(axes[1], y_hetero, 'Heteroscedastic')]:
model = sm.OLS(y, X_dm).fit()
ax.scatter(model.fittedvalues, model.resid, alpha=0.5)
ax.axhline(0, color='red', linestyle='--')
ax.set_title(f'{label} — Residuals vs Fitted')
ax.set_xlabel('Fitted Values')
ax.set_ylabel('Residuals')
plt.tight_layout()
plt.savefig('heteroscedasticity.png', dpi=150)
plt.show()
# Detection tests
model_h = sm.OLS(y_hetero, X_dm).fit()
bp_stat, bp_p, _, _ = het_breuschpagan(model_h.resid, model_h.model.exog)
wh_stat, wh_p, _, _ = het_white(model_h.resid, model_h.model.exog)
print(f"Breusch-Pagan test: χ²={bp_stat:.4f}, p={bp_p:.6f}")
print(f"White's test: χ²={wh_stat:.4f}, p={wh_p:.6f}")
# Solution 1: Robust standard errors (HC3)
model_robust = sm.OLS(y_hetero, X_dm).fit(cov_type='HC3')
print("\nOLS with HC3 robust standard errors:")
print(model_robust.summary().tables[1])
# Solution 2: Log transformation (if Y is always positive)
y_log = np.log(np.abs(y_hetero) + 1)
model_log = sm.OLS(y_log, X_dm).fit()
bp_log, p_log, _, _ = het_breuschpagan(model_log.resid, model_log.model.exog)
print(f"\nAfter log(Y) transform: BP p-value = {p_log:.4f}")
Key Takeaways
- Heteroscedasticity biases standard errors but not point estimates
- Residuals vs fitted plot is the primary visual diagnostic (fan/trumpet shape)
- Breusch-Pagan tests linear heteroscedasticity; White's is more general
- Robust SEs (HC3) are the easiest fix — valid even under heteroscedasticity
- Log transformation often cures heteroscedasticity for strictly positive variables