Autocorrelation in Regression
Autocorrelation means the residuals are correlated across time (or space). It violates the independence assumption and biases standard errors.
import numpy as np
import statsmodels.api as sm
from statsmodels.stats.stattools import durbin_watson
from statsmodels.graphics.tsaplots import plot_acf
import matplotlib.pyplot as plt
np.random.seed(42)
n = 100
t = np.arange(n)
# Simulate time series regression with autocorrelated errors (AR(1))
X = t/10 + np.random.normal(0, 0.5, n)
rho = 0.7 # autocorrelation coefficient
epsilon = np.zeros(n)
epsilon[0] = np.random.normal(0, 1)
for i in range(1, n):
epsilon[i] = rho * epsilon[i-1] + np.random.normal(0, 1)
y = 2 + 0.5*X + epsilon
X_dm = sm.add_constant(X)
model = sm.OLS(y, X_dm).fit()
# Durbin-Watson test
dw = durbin_watson(model.resid)
print(f"Durbin-Watson statistic = {dw:.4f}")
print(f"Interpretation: {'No autocorrelation' if 1.5<dw<2.5 else 'Positive autocorrelation' if dw<1.5 else 'Negative autocorrelation'}")
# ACF plot of residuals
fig, axes = plt.subplots(1, 2, figsize=(12, 4))
axes[0].scatter(t, model.resid, alpha=0.6)
axes[0].axhline(0, color='red', linestyle='--')
axes[0].set_title('Residuals over Time')
plot_acf(model.resid, ax=axes[1], lags=20, alpha=0.05)
axes[1].set_title('ACF of Residuals (significant lags = autocorrelation)')
plt.tight_layout()
plt.savefig('autocorrelation.png', dpi=150)
plt.show()
# Fix: Newey-West robust standard errors (HAC)
model_nw = sm.OLS(y, X_dm).fit(cov_type='HAC', cov_kwds={'maxlags':5})
print("\nNewey-West HAC standard errors:")
print(f" β₁ SE (OLS): {model.bse['x1']:.4f}")
print(f" β₁ SE (HAC): {model_nw.bse['x1']:.4f}")
print(f" HAC SE is larger — reflects true uncertainty with autocorrelation")
Key Takeaways
- Durbin-Watson ≈ 2: no autocorrelation; <1.5: positive; >2.5: negative
- ACF plot identifies lag structure — which lags are significant?
- Consequences: standard errors are biased (usually too small → inflated t-stats)
- Newey-West HAC SEs correct inference without changing point estimates
- For strong autocorrelation: use ARIMA or add lagged Y as predictor