Polynomial Regression — Fitting Nonlinear Relationships

Regression AnalysisNonlinear RegressionFree Lesson

Advertisement

Polynomial Regression

Polynomial regression models nonlinear relationships by including powers of X as predictors, while remaining a linear model in the coefficients:

Y=β0+β1X+β2X2++βdXd+εY = \beta_0 + \beta_1X + \beta_2X^2 + \cdots + \beta_dX^d + \varepsilon

import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import Pipeline
from sklearn.model_selection import cross_val_score
import warnings; warnings.filterwarnings('ignore')

np.random.seed(42)
n = 80
X = np.linspace(-3, 3, n)
y = 0.5*X**3 - X**2 + 2*X + np.random.normal(0, 1.5, n)

X_2d = X.reshape(-1, 1)
X_plot = np.linspace(-3.2, 3.2, 300).reshape(-1, 1)

fig, axes = plt.subplots(2, 3, figsize=(15, 8))
degrees = [1, 2, 3, 5, 10, 20]
colors = ['blue','green','red','orange','purple','brown']

cv_scores = {}
for ax, deg, col in zip(axes.flat, degrees, colors):
    model = Pipeline([('poly', PolynomialFeatures(deg)),
                      ('lin',  LinearRegression())])
    model.fit(X_2d, y)
    y_pred = model.predict(X_plot)
    
    # Cross-validated R²
    cv_r2 = cross_val_score(model, X_2d, y, cv=5, scoring='r2').mean()
    train_r2 = model.score(X_2d, y)
    cv_scores[deg] = cv_r2
    
    ax.scatter(X, y, alpha=0.4, s=20, color='gray')
    ax.plot(X_plot, y_pred, col, linewidth=2)
    ax.set_ylim(-25, 25)
    ax.set_title(f'Degree {deg}\nTrain R²={train_r2:.3f}, CV R²={cv_r2:.3f}')
    if deg == 3:
        ax.set_title(f'Degree {deg} ← CORRECT\nTrain R²={train_r2:.3f}, CV R²={cv_r2:.3f}')

plt.suptitle('Polynomial Regression: Underfitting → Overfitting', fontsize=14)
plt.tight_layout()
plt.savefig('polynomial_regression.png', dpi=150)
plt.show()

print("Cross-Validated R² by Degree:")
for deg, cv in cv_scores.items():
    bar = '█' * max(0, int(cv*20))
    print(f"  Degree {deg:2d}: {cv:.4f} {bar}")
print("Peak CV R² indicates optimal degree")

Key Takeaways

  1. Polynomial regression is still linear — in the parameters β
  2. Higher degree = more flexible but risks overfitting
  3. Use cross-validation to select the optimal polynomial degree
  4. Center and scale X before computing powers to reduce numerical instability
  5. Splines are usually better than high-degree polynomials for flexible fitting

Advertisement

Need Expert Statistics Help?

Get personalized tutoring, dissertation support, or statistical consulting.

Advertisement