Quantile Regression — Beyond the Mean

Free Lesson

Advertisement

Quantile Regression

OLS estimates the conditional mean. Quantile regression estimates any conditional quantile — revealing how the entire distribution, not just the average, shifts with predictors.

The τ-th quantile minimizes:

i:yiy^iτyiy^i+i:yi<y^i(1τ)yiy^i\sum_{i: y_i \geq \hat{y}_i} \tau|y_i - \hat{y}_i| + \sum_{i: y_i < \hat{y}_i} (1-\tau)|y_i - \hat{y}_i|

import numpy as np
import pandas as pd
import statsmodels.formula.api as smf
import matplotlib.pyplot as plt

np.random.seed(42)
n = 300
education = np.random.randint(10, 20, n)
# Variance grows with education (income inequality widens)
wage = np.exp(1 + 0.08*education + np.random.normal(0, 0.3 + 0.05*education, n))

df = pd.DataFrame({'wage': wage, 'education': education})

quantiles = [0.10, 0.25, 0.50, 0.75, 0.90]
qr_results = {q: smf.quantreg('wage ~ education', df).fit(q=q) for q in quantiles}

print("Education coefficient by quantile:")
for q, r in qr_results.items():
    coef = r.params['education']
    ci = r.conf_int().loc['education']
    print(f"  Q({q:.2f}): {coef:.4f} [{ci.iloc[0]:.4f}, {ci.iloc[1]:.4f}]")

# OLS for comparison
ols = smf.ols('wage ~ education', df).fit()
print(f"  OLS:     {ols.params['education']:.4f}")

# Visualization
fig, ax = plt.subplots(figsize=(10, 6))
educ_range = np.linspace(10, 20, 100)
new_data = pd.DataFrame({'education': educ_range})
colors = ['lightblue','steelblue','navy','coral','red']

for (q, r), col in zip(qr_results.items(), colors):
    ax.plot(educ_range, r.predict(new_data), color=col, linewidth=2, label=f'Q({q})')
ax.plot(educ_range, ols.predict(new_data), 'k--', linewidth=2.5, label='OLS mean')
ax.scatter(education, wage, alpha=0.2, s=15, color='gray')
ax.set_xlabel('Education (years)')
ax.set_ylabel('Wage ($)')
ax.set_title('Quantile Regression: Wage-Education Relationship')
ax.legend()
plt.tight_layout()
plt.savefig('quantile_regression.png', dpi=150)
plt.show()

Key Takeaways

  1. Quantile regression reveals heterogeneous effects — different for low vs high earners
  2. τ = 0.5 gives median regression — robust to outliers (LAD estimator)
  3. No distributional assumption needed
  4. Applications: income inequality, growth charts, risk models (VaR)
  5. statsmodels.formula.api.quantreg fits quantile regression in Python

Advertisement

Need Expert Statistics Help?

Get personalized tutoring, dissertation support, or statistical consulting.

Advertisement