Introduction
Scikit-Learn provides a consistent interface for machine learning algorithms with estimator, transformer, and evaluator patterns.
Estimator Interface
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeClassifier
# Every estimator has fit() and predict()
reg = LinearRegression()
X = [[1], [2], [3], [4], [5]]
y = [1, 2, 3, 4, 5]
reg.fit(X, y)
print(reg.predict([[6]])) # [6.]
clf = DecisionTreeClassifier()
X_clf = [[0, 0], [1, 1], [0, 1], [1, 0]]
y_clf = [0, 1, 1, 0]
clf.fit(X_clf, y_clf)
print(clf.predict([[1, 1]])) # [1]
Working with Data
from sklearn.datasets import load_iris, make_classification
# Load built-in dataset
iris = load_iris()
X, y = iris.data, iris.target
print(f"Features: {iris.feature_names}")
# Generate synthetic data
X, y = make_classification(n_samples=100, n_features=4, n_classes=2)
Estimator Properties
from sklearn.linear_model import LinearRegression
reg = LinearRegression(fit_intercept=True, normalize=False)
reg.fit([[1], [2], [3]], [1, 2, 3])
# Attributes after fitting
print(f"Coefficients: {reg.coef_}")
print(f"Intercept: {reg.intercept_}")
print(f"Score: {reg.score([[1], [2], [3]], [1, 2, 3])}")
Model Persistence
import joblib
from sklearn.linear_model import LogisticRegression
# Save model
model = LogisticRegression()
model.fit([[1], [2], [3]], [0, 1, 1])
joblib.dump(model, 'model.joblib')
# Load model
loaded_model = joblib.load('model.joblib')
print(loaded_model.predict([[1.5]]))
Practice Problems
- Implement linear regression on housing data
- Fit decision tree classifier
- Extract model coefficients and intercept
- Save and load trained model
- Use different estimator for same data