Introduction
Supervised learning uses labeled training data to learn a mapping from inputs to outputs. It's the most common type of ML.
Classification vs Regression
| Classification | Regression |
|---|---|
| Predicts categories | Predicts continuous values |
| Email spam/not spam | House price prediction |
| Image classification | Sales forecasting |
| Customer churn | Temperature prediction |
Classification Algorithms
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
Regression Algorithms
from sklearn.linear_model import LinearRegression, Ridge, Lasso, ElasticNet
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.svm import SVR
from sklearn.neighbors import KNeighborsRegressor
Example: Classification
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report
# Load data
iris = load_iris()
X, y = iris.data, iris.target
# Split
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Train
model = RandomForestClassifier(n_estimators=100)
model.fit(X_train, y_train)
# Predict
y_pred = model.predict(X_test)
# Evaluate
print(f"Accuracy: {accuracy_score(y_test, y_pred)}")
print(classification_report(y_test, y_pred))
Example: Regression
from sklearn.datasets import load_boston
from sklearn.metrics import mean_squared_error, r2_score
# Load data
boston = load_boston()
X, y = boston.data, boston.target
# Split and train
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
model = LinearRegression()
model.fit(X_train, y_train)
# Predict and evaluate
y_pred = model.predict(X_test)
print(f"R2 Score: {r2_score(y_test, y_pred)}")
print(f"RMSE: {np.sqrt(mean_squared_error(y_test, y_pred))}")
Model Comparison
from sklearn.model_selection import cross_val_score
models = {
'Logistic': LogisticRegression(),
'Random Forest': RandomForestClassifier(),
'SVM': SVC()
}
for name, model in models.items():
scores = cross_val_score(model, X, y, cv=5, scoring='accuracy')
print(f"{name}: {scores.mean():.3f} (+/- {scores.std():.3f})")
Key Takeaways
- Supervised learning uses labeled data
- Classification for categories, regression for values
- Many algorithms available - choose based on problem
- Always evaluate on held-out test data