OLS Estimation — Deriving Regression Coefficients from Scratch

Regression AnalysisLinear RegressionFree Lesson

Advertisement

OLS Estimation: From First Principles

Ordinary Least Squares (OLS) finds β̂ that minimizes the sum of squared residuals:

β^=argminβyXβ2\hat{\boldsymbol{\beta}} = \underset{\boldsymbol{\beta}}{\arg\min} \|\mathbf{y} - \mathbf{X}\boldsymbol{\beta}\|^2

Taking the derivative and setting to zero gives the normal equations:

XTXβ^=XTy\mathbf{X}^T\mathbf{X}\hat{\boldsymbol{\beta}} = \mathbf{X}^T\mathbf{y}

β^=(XTX)1XTy\hat{\boldsymbol{\beta}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}

import numpy as np
import pandas as pd
from scipy import stats

np.random.seed(42)

# Data
n = 50
X_raw = np.random.uniform(0, 10, n)
y = 3 + 2.5 * X_raw + np.random.normal(0, 2, n)

# Design matrix: [1 | X]
X = np.column_stack([np.ones(n), X_raw])

# Normal equations: β̂ = (X'X)⁻¹ X'y
XtX = X.T @ X
Xty = X.T @ y
beta_hat = np.linalg.solve(XtX, Xty)  # better than inv() for numerical stability

print(f"β̂₀ (intercept) = {beta_hat[0]:.4f}  (true: 3)")
print(f"β̂₁ (slope)     = {beta_hat[1]:.4f}  (true: 2.5)")

# Fitted values and residuals
y_hat = X @ beta_hat
residuals = y - y_hat

# Standard error of the regression (RMSE)
n_params = 2  # β₀ and β₁
SSR = residuals @ residuals
sigma2_hat = SSR / (n - n_params)
print(f"σ̂² = {sigma2_hat:.4f}")
print(f"σ̂ (RMSE) = {np.sqrt(sigma2_hat):.4f}")

# Variance-covariance matrix of β̂
Var_beta = sigma2_hat * np.linalg.inv(XtX)
SE_beta = np.sqrt(np.diag(Var_beta))
print(f"SE(β̂₀) = {SE_beta[0]:.4f}")
print(f"SE(β̂₁) = {SE_beta[1]:.4f}")

# t-statistics and p-values
t_stats = beta_hat / SE_beta
p_values = 2 * stats.t.sf(np.abs(t_stats), df=n-n_params)

print(f"\nt-test for β₁: t={t_stats[1]:.4f}, p={p_values[1]:.6f}")

# R-squared
SS_total = ((y - y.mean())**2).sum()
SS_resid = (residuals**2).sum()
R2 = 1 - SS_resid/SS_total
print(f"R² = {R2:.4f}")

# Verify with statsmodels
import statsmodels.api as sm
model = sm.OLS(y, X).fit()
np.testing.assert_allclose(beta_hat, model.params, rtol=1e-10)
print("\n✓ Manual OLS matches statsmodels exactly")

Properties of OLS Estimators

Gauss-Markov Theorem: Under the classical assumptions, OLS is BLUE:

  • Best (minimum variance)
  • Linear (linear function of y)
  • Unbiased (E[β̂] = β)
  • Estimator
# Demonstrate unbiasedness: repeat many times and show E[β̂] ≈ β
n_sims = 10000
beta1_estimates = []
for _ in range(n_sims):
    X_raw_s = np.random.uniform(0, 10, 50)
    y_s = 3 + 2.5*X_raw_s + np.random.normal(0, 2, 50)
    X_s = np.column_stack([np.ones(50), X_raw_s])
    b = np.linalg.solve(X_s.T@X_s, X_s.T@y_s)
    beta1_estimates.append(b[1])

print(f"True β₁ = 2.5")
print(f"E[β̂₁] over {n_sims:,} samples = {np.mean(beta1_estimates):.4f}")
print(f"Std(β̂₁) = {np.std(beta1_estimates):.4f}")

Key Takeaways

  1. OLS normal equations: β̂ = (XᵀX)⁻¹Xᵀy — the closed-form solution
  2. Gauss-Markov: OLS is BLUE when errors are i.i.d. with constant variance
  3. σ̂² = SSR/(n−p) — unbiased estimator of error variance (p = number of parameters)
  4. np.linalg.solve() is more stable than np.linalg.inv() for the normal equations
  5. The design matrix X includes a column of 1s for the intercept

Advertisement

Need Expert Statistics Help?

Get personalized tutoring, dissertation support, or statistical consulting.

Advertisement