Introduction
Scikit-learn provides simple and efficient tools for data mining and machine learning.
Basic Workflow
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# Preprocess
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Train model
model = LinearRegression()
model.fit(X_train_scaled, y_train)
# Predict
y_pred = model.predict(X_test_scaled)
# Evaluate
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
Model Persistence
from joblib import dump, load
# Save model
dump(model, "model.joblib")
# Load model
model = load("model.joblib")
Practice Problems
- Train linear regression model
- Evaluate with multiple metrics
- Save and load models
- Split data properly
- Compare different models