Scikit-Learn Pipeline

Machine LearningScikit-LearnFree Lesson

Advertisement

Introduction

Pipeline chains multiple transformers with a final estimator for streamlined ML workflows.

Creating Pipelines

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression

# Manual pipeline creation
pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('classifier', LogisticRegression())
])

# Fit on data
X = [[1, 2], [3, 4], [5, 6]]
y = [0, 1, 0]
pipeline.fit(X, y)
prediction = pipeline.predict([[2, 3]])

make_pipeline

from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC

# Shorthand syntax - auto-names steps
model = make_pipeline(StandardScaler(), SVC(kernel='linear'))

# Works identically
model.fit(X, y)
result = model.predict([[0, 1]])

Pipeline with Preprocessing

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputer
import pandas as pd

# Different transformations for different columns
preprocessor = ColumnTransformer([
    ('num', StandardScaler(), ['age', 'income']),
    ('cat', OneHotEncoder(), ['city', 'occupation'])
])

pipeline = Pipeline([
    ('preprocess', preprocessor),
    ('classifier', LogisticRegression())
])

Accessing Pipeline Steps

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression

pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('regressor', LinearRegression())
])
pipeline.fit([[1], [2], [3]], [1, 2, 3])

# Access individual steps
scaler = pipeline.named_steps['scaler']
regressor = pipeline.named_steps['regressor']

# Get feature names after preprocessing
print(regressor.coef_)

Practice Problems

  1. Create pipeline with scaler and classifier
  2. Use make_pipeline for quick setup
  3. Add feature selection to pipeline
  4. Access individual step parameters
  5. Use ColumnTransformer for mixed data types

Advertisement

Need Expert Python Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement