Multiple Linear Regression — Theory and Python

Regression AnalysisMultiple RegressionFree Lesson

Advertisement

Multiple Linear Regression

Extends simple regression to multiple predictors:

Y=β0+β1X1+β2X2++βpXp+εY = \beta_0 + \beta_1X_1 + \beta_2X_2 + \cdots + \beta_pX_p + \varepsilon

import numpy as np
import pandas as pd
import statsmodels.api as sm
from scipy import stats
import matplotlib.pyplot as plt

np.random.seed(42)
n = 200

# House price data: size, bedrooms, age
house_size = np.random.uniform(1000, 3500, n)
bedrooms   = np.random.choice([1,2,3,4,5], n, p=[0.05,0.2,0.4,0.25,0.1])
age        = np.random.uniform(0, 50, n)

price = (50000 + 120*house_size + 8000*bedrooms - 500*age
         + np.random.normal(0, 25000, n))

df = pd.DataFrame({'price':price,'size':house_size,'bedrooms':bedrooms,'age':age})

X = sm.add_constant(df[['size','bedrooms','age']])
model = sm.OLS(df['price'], X).fit()
print(model.summary())

# Interpretation
print("\nCoefficient Interpretation:")
for name, coef, pval in zip(model.params.index, model.params, model.pvalues):
    sig = "***" if pval<0.001 else "**" if pval<0.01 else "*" if pval<0.05 else "ns"
    print(f"  {name:12s}: {coef:>10.2f}  (p={pval:.4f} {sig})")

# F-test: overall model significance
print(f"\nF({model.df_model:.0f},{model.df_resid:.0f}) = {model.fvalue:.2f}, p = {model.f_pvalue:.6f}")
print(f"R² = {model.rsquared:.4f}, Adj R² = {model.rsquared_adj:.4f}")

# Prediction with confidence interval
new_house = pd.DataFrame({'const':1,'size':[2000],'bedrooms':[3],'age':[10]})
pred = model.get_prediction(new_house)
summary = pred.summary_frame(alpha=0.05)
print(f"\nPrediction for 2000 sqft, 3 bed, 10yr old:")
print(f"  Predicted: ${summary['mean'].iloc[0]:,.0f}")
print(f"  95% CI: (${summary['mean_ci_lower'].iloc[0]:,.0f}, ${summary['mean_ci_upper'].iloc[0]:,.0f})")
print(f"  95% PI: (${summary['obs_ci_lower'].iloc[0]:,.0f}, ${summary['obs_ci_upper'].iloc[0]:,.0f})")

Key Takeaways

  1. Each β coefficient = change in Y per unit change in Xᵢ, holding all others constant
  2. Adjusted R² should be used for model comparison (penalizes extra predictors)
  3. F-test tests H₀: all βᵢ = 0 (model has no explanatory power)
  4. Interpretation requires ceteris paribus — "all else equal" is the key phrase
  5. Check multicollinearity (VIF) before interpreting individual coefficients

Advertisement

Need Expert Statistics Help?

Get personalized tutoring, dissertation support, or statistical consulting.

Advertisement