Pipeline and GridSearchCV

Machine LearningScikit-LearnFree Lesson

Advertisement

Introduction

Create reproducible machine learning workflows with pipelines.

Pipeline Basics

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.linear_model import LogisticRegression

pipeline = Pipeline([
    ("scaler", StandardScaler()),
    ("pca", PCA(n_components=2)),
    ("classifier", LogisticRegression())
])

pipeline.fit(X_train, y_train)
predictions = pipeline.predict(X_test)

Pipeline with GridSearch

from sklearn.model_selection import GridSearchCV

pipeline = Pipeline([
    ("scaler", StandardScaler()),
    ("classifier", LogisticRegression())
])

param_grid = {
    "classifier__C": [0.1, 1, 10],
    "scaler": [StandardScaler(), None]
}

grid = GridSearchCV(pipeline, param_grid, cv=5)
grid.fit(X, y)

Column Transformer

from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder

preprocessor = ColumnTransformer([
    ("num", StandardScaler(), numerical_features),
    ("cat", OneHotEncoder(), categorical_features)
])

Practice Problems

  1. Build preprocessing pipeline
  2. Combine with GridSearch
  3. Use ColumnTransformer
  4. Access pipeline steps
  5. Create custom transformers

Advertisement

Need Expert Python Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement