Introduction
Create reproducible machine learning workflows with pipelines.
Pipeline Basics
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.linear_model import LogisticRegression
pipeline = Pipeline([
("scaler", StandardScaler()),
("pca", PCA(n_components=2)),
("classifier", LogisticRegression())
])
pipeline.fit(X_train, y_train)
predictions = pipeline.predict(X_test)
Pipeline with GridSearch
from sklearn.model_selection import GridSearchCV
pipeline = Pipeline([
("scaler", StandardScaler()),
("classifier", LogisticRegression())
])
param_grid = {
"classifier__C": [0.1, 1, 10],
"scaler": [StandardScaler(), None]
}
grid = GridSearchCV(pipeline, param_grid, cv=5)
grid.fit(X, y)
Column Transformer
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
preprocessor = ColumnTransformer([
("num", StandardScaler(), numerical_features),
("cat", OneHotEncoder(), categorical_features)
])
Practice Problems
- Build preprocessing pipeline
- Combine with GridSearch
- Use ColumnTransformer
- Access pipeline steps
- Create custom transformers