R Linear Regression — Modeling Relationships

R Data ScienceLinear RegressionFree Lesson

Advertisement

R Linear Regression — Modeling Relationships

Learning Objectives

By the end of this tutorial, you will be able to:

  • Fit simple and multiple linear regression models
  • Interpret regression coefficients and diagnostics
  • Check model assumptions (normality, homoscedasticity, multicollinearity)
  • Perform model selection and comparison
  • Visualize regression results

Simple Linear Regression

# Model: y = β₀ + β₁x + ε
# Using mtcars
model <- lm(mpg ~ wt, data = mtcars)

# Summary
summary(model)
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)
# (Intercept)  37.285      1.878  19.858  < 2e-16 ***
# wt           -5.344      0.559  -9.559  1.29e-10 ***

# Predictions
predict(model, newdata = data.frame(wt = 3))

# Confidence intervals
confint(model, level = 0.95)

Multiple Linear Regression

# Multiple predictors
model <- lm(mpg ~ wt + hp + cyl, data = mtcars)
summary(model)

# Variable selection
step_model <- step(model, direction = "both")
summary(step_model)

# Interactions
model_interaction <- lm(mpg ~ wt * hp, data = mtcars)
summary(model_interaction)

# Polynomial regression
model_poly <- lm(mpg ~ poly(wt, 2), data = mtcars)
summary(model_poly)

Model Diagnostics

model <- lm(mpg ~ wt + hp, data = mtcars)

# Diagnostic plots
par(mfrow = c(2, 2))
plot(model)
par(mfrow = c(1, 1))

# Residuals
residuals(model)
rstandard(model)  # Standardized residuals
rstudent(model)   # Studentized residuals

# Influential points
cooks.distance(model)
hatvalues(model)

# Normality test
shapiro.test(residuals(model))

# Homoscedasticity
library(lmtest)
bptest(model)  # Breusch-Pagan test

# Multicollinearity
library(car)
vif(model)  # Variance Inflation Factor
# VIF > 5 indicates multicollinearity

Model Comparison

# Full vs reduced model
model1 <- lm(mpg ~ wt, data = mtcars)
model2 <- lm(mpg ~ wt + hp, data = mtcars)
model3 <- lm(mpg ~ wt + hp + cyl, data = mtcars)

# ANOVA comparison
anova(model1, model2, model3)

# Information criteria
AIC(model1, model2, model3)
BIC(model1, model2, model3)

# Adjusted R²
summary(model1)$adj.r.squared
summary(model2)$adj.r.squared
summary(model3)$adj.r.squared

Visualization

library(ggplot2)

# Scatter + regression line
ggplot(mtcars, aes(x = wt, y = mpg)) +
  geom_point() +
  geom_smooth(method = "lm")

# Multiple regression surface
ggplot(mtcars, aes(x = wt, y = hp, color = mpg)) +
  geom_point(size = 3) +
  scale_color_viridis_c()

# Residual plot
ggplot(data.frame(fitted = fitted(model), residuals = residuals(model)),
       aes(x = fitted, y = residuals)) +
  geom_point() +
  geom_hline(yintercept = 0, linetype = "dashed")

# QQ plot
qqnorm(residuals(model))
qqline(residuals(model))

Practical Examples

Example 1: Predict House Prices

# Simulated data
set.seed(42)
n <- 200
data <- data.frame(
  sqft = runif(n, 1000, 3000),
  bedrooms = sample(2:5, n, replace = TRUE),
  age = runif(n, 0, 50)
)
data$price <- 50000 + 150 * data$sqft + 10000 * data$bedrooms - 500 * data$age + rnorm(n, 0, 20000)

model <- lm(price ~ sqft + bedrooms + age, data = data)
summary(model)

# Predict
new_house <- data.frame(sqft = 2000, bedrooms = 3, age = 10)
predict(model, newdata = new_house, interval = "prediction")

Practice Exercises

Exercise 1: Model Building

Build a model to predict mpg from mtcars using all available predictors. Then use stepwise selection to find the best model.

Solution

# Full model
full_model <- lm(mpg ~ ., data = mtcars)
summary(full_model)

# Stepwise selection
best_model <- step(full_model, direction = "both", trace = 0)
summary(best_model)

# Compare
AIC(full_model, best_model)

Key Takeaways

  • lm() fits linear modelsy ~ x1 + x2 + x3
  • summary() shows coefficients, R², p-values
  • Check assumptions: normality, homoscedasticity, no multicollinearity
  • VIF greater than 5 indicates multicollinearity problems
  • Use step() for variable selection — AIC-based
  • predict() generates predictions with confidence intervals
  • Diagnostic plots reveal model problems

Next: Learn about R Logistic Regression — modeling binary outcomes.

Advertisement

Need Expert R Programming Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement