Linear Regression

Supervised LearningRegressionFree Lesson

Advertisement

Introduction

Linear Regression is a fundamental supervised learning algorithm that models the relationship between variables using a linear equation.

Simple Linear Regression

y=β0+β1x+ϵy = \beta_0 + \beta_1 x + \epsilon

Where:

  • yy is the target variable
  • xx is the predictor
  • β0\beta_0 is the intercept
  • β1\beta_1 is the slope
  • ϵ\epsilon is the error term

Multiple Linear Regression

y=β0+β1x1+β2x2+...+βnxny = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + ... + \beta_n x_n

Python Implementation

import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

# Sample data
X = pd.DataFrame({
    'sqft': [1000, 1500, 2000, 2500, 3000],
    'bedrooms': [2, 3, 3, 4, 4],
    'age': [10, 5, 15, 8, 20]
})
y = np.array([200000, 280000, 350000, 420000, 480000])

# Split data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# Train model
model = LinearRegression()
model.fit(X_train, y_train)

# Coefficients
print(f"Intercept: {model.intercept_}")
print(f"Coefficients: {model.coef_}")

# Predict
y_pred = model.predict(X_test)

# Evaluate
print(f"R2 Score: {r2_score(y_test, y_pred)}")
print(f"RMSE: {np.sqrt(mean_squared_error(y_test, y_pred))}")

Assumptions of Linear Regression

  1. Linearity: Relationship between X and y is linear
  2. Independence: Observations are independent
  3. Homoscedasticity: Constant variance of residuals
  4. Normality: Residuals are normally distributed
  5. No multicollinearity: Predictors not highly correlated

Checking Assumptions

import matplotlib.pyplot as plt
from scipy import stats

# Residuals
residuals = y_test - y_pred

# Residual plot
plt.scatter(y_pred, residuals)
plt.axhline(y=0, color='r', linestyle='--')

# Q-Q plot
stats.probplot(residuals, dist="norm", plot=plt)
plt.show()

Regularized Linear Regression

from sklearn.linear_model import Ridge, Lasso, ElasticNet

# L2 Regularization (Ridge)
ridge = Ridge(alpha=1.0)
ridge.fit(X_train, y_train)

# L1 Regularization (Lasso) - Feature selection
lasso = Lasso(alpha=1.0)
lasso.fit(X_train, y_train)

# ElasticNet (L1 + L2)
elastic = ElasticNet(alpha=1.0, l1_ratio=0.5)
elastic.fit(X_train, y_train)

Key Takeaways

  1. Linear regression models linear relationships
  2. Check assumptions before trusting results
  3. Regularization prevents overfitting
  4. Lasso can perform feature selection

Advertisement

Need Expert Data Science Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement