CW

Linear Regression: Math, Code and Assumptions

Module 7: Machine Learning FundamentalsFree Lesson

Advertisement

Linear Regression: Math, Code and Assumptions

The Foundation of Machine Learning

Linear regression is the most fundamental algorithm in ML. Despite its simplicity, understanding it deeply provides insight into all supervised learning methods.

ML Algorithm Landscape

Supervised Learning AlgorithmsLinearLinearRegressionLogisticRegressionRidge/LassoTree-BasedDecisionTreeRandomForestXGBoostNeuralPerceptronMLPDeepLearningSupportLinearSVMKernelSVMSVR

Linear Regression is the foundation — understand this first!


1. Simple Linear Regression

Mathematical Formulation

Model:

y^=β0+β1x+ϵ\hat{y} = \beta_0 + \beta_1 x + \epsilon

Where:

  • β0\beta_0 = intercept (bias) — value of yy when x=0x = 0
  • β1\beta_1 = slope (weight) — change in yy for unit change in xx
  • ϵ\epsilon = error term — ϵN(0,σ2)\epsilon \sim N(0, \sigma^2)
Simple Linear Regression: Finding the Best Fit LineFeature (x)Target (y)eᵢeᵢeᵢeᵢeᵢeᵢβ₀ = interceptβ₁ = slopeActual data pointsRegression lineResiduals (errors)

2. Cost Function (Ordinary Least Squares)

Mean Squared Error (MSE):

J(β0,β1)=1ni=1n(yiy^i)2=1ni=1n(yi(β0+β1xi))2J(\beta_0, \beta_1) = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 = \frac{1}{n} \sum_{i=1}^{n} (y_i - (\beta_0 + \beta_1 x_i))^2

Goal: Find β0,β1\beta_0, \beta_1 that minimize JJ

Closed-Form Solution (Normal Equation):

β1=i=1n(xixˉ)(yiyˉ)i=1n(xixˉ)2=Cov(X,Y)Var(X)\beta_1 = \frac{\sum_{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^{n}(x_i - \bar{x})^2} = \frac{\text{Cov}(X,Y)}{\text{Var}(X)}
β0=yˉβ1xˉ\beta_0 = \bar{y} - \beta_1 \bar{x}
Cost Function: The Bowl-Shaped SurfaceGlobal MinimumGradient DescentGradient Descentβ₁ (slope)J(β₀, β₁)

The cost function is convex — gradient descent finds the global minimum


3. Gradient Descent

Update Rule:

βj:=βjαJβj\beta_j := \beta_j - \alpha \frac{\partial J}{\partial \beta_j}

Partial Derivatives:

Jβ0=2ni=1n(yiy^i)\frac{\partial J}{\partial \beta_0} = -\frac{2}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)
Jβ1=2ni=1n(yiy^i)xi\frac{\partial J}{\partial \beta_1} = -\frac{2}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i) \cdot x_i

Where α\alpha = learning rate (step size)

Gradient Descent: Learning Rate Impactα = 0.1 ✓α = 1.0 ✗ (oscillates)α = 0.001 (too slow)Good learning rateToo largeToo small

4. Multiple Linear Regression

Model:

y^=β0+β1x1+β2x2++βpxp=β0+j=1pβjxj\hat{y} = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_p x_p = \beta_0 + \sum_{j=1}^{p} \beta_j x_j

Matrix Form:

y^=Xβ\hat{\mathbf{y}} = \mathbf{X}\boldsymbol{\beta}

Where XRn×(p+1)\mathbf{X} \in \mathbb{R}^{n \times (p+1)} (design matrix with intercept column)

Normal Equation (Matrix):

β^=(XTX)1XTy\boldsymbol{\hat{\beta}} = (\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}
Multiple Regression: Multiple Features → Single Outputx₁(Size)β₁x₂(Beds)β₂x₃(Age)β₃x₄(Baths)β₄LinearModelŷ = β₀ + ΣβⱼxⱼOutputŷ (Price)

5. Model Evaluation Metrics

R² Score (Coefficient of Determination):

R2=1SSresSStot=1i=1n(yiy^i)2i=1n(yiyˉ)2R^2 = 1 - \frac{SS_{res}}{SS_{tot}} = 1 - \frac{\sum_{i=1}^{n}(y_i - \hat{y}_i)^2}{\sum_{i=1}^{n}(y_i - \bar{y})^2}
  • R2=1R^2 = 1: Perfect fit
  • R2=0R^2 = 0: Model predicts the mean
  • R2<0R^2 < 0: Model is worse than predicting the mean

Adjusted R²:

Radj2=1(1R2)(n1)np1R^2_{adj} = 1 - \frac{(1-R^2)(n-1)}{n-p-1}
R² Score: How Well Does the Model Fit?SS_total = Σ(yᵢ - ȳ)² = Total VarianceSS_explained = Σ(ŷᵢ - ȳ)² = 70%SS_residual = 30%

R² = 1 - (30/100) = 0.70 (70% variance explained)


6. Assumptions of Linear Regression

5 Key Assumptions to Validate1. Linearityy = f(x) is linear2. IndependenceErrors are independent3. HomoscedasticityConstant variance4. Normality of Errorsε ~ N(0, σ²)5. No MulticollinearityX₁ ↔ X₂Features not correlated

Checking Assumptions with Residual Plots

Residual Analysis: What to Look For✓ Good: Random✗ Bad: Funnel✗ Bad: Pattern

7. Implementation in Python

import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

# Generate sample data
np.random.seed(42)
X = 2 * np.random.rand(100, 1)
y = 4 + 3 * X + np.random.randn(100, 1)

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Fit model
model = LinearRegression()
model.fit(X_train, y_train)

# Predictions
y_pred = model.predict(X_test)

# Evaluate
print(f"Intercept (β₀): {model.intercept_[0]:.4f}")
print(f"Slope (β₁): {model.coef_[0][0]:.4f}")
print(f"R² Score: {r2_score(y_test, y_pred):.4f}")
print(f"RMSE: {np.sqrt(mean_squared_error(y_test, y_pred)):.4f}")

# Visualize
plt.scatter(X_test, y_test, color='blue', alpha=0.6, label='Actual')
plt.plot(X_test, y_pred, color='red', linewidth=2, label='Predicted')
plt.xlabel('Feature')
plt.ylabel('Target')
plt.title('Linear Regression Fit')
plt.legend()
plt.show()

Key Takeaways

  1. Linear regression finds the best-fit line through data points
  2. Cost function (MSE) measures prediction error — minimize it
  3. Gradient descent iteratively updates weights to find minimum
  4. R² score tells you how much variance the model explains
  5. Validate assumptions before trusting the model
  6. Regularization (Ridge/Lasso) prevents overfitting

Next: Logistic Regression

Extend linear regression to classification with the sigmoid function.

Advertisement

Need Expert Data Science Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement