Logistic Regression — Complete Guide for Classification
Despite its name, logistic regression is a classification algorithm. It predicts the probability that an input belongs to a class.
From Linear to Logistic Regression
Linear Regression: y = wx + b (output: any real number)
Logistic Regression: y = σ(wx + b) (output: probability 0-1)
The Sigmoid Function:
σ(z) = 1 / (1 + e⁻ᶻ)
Output:
z = 0 → σ(z) = 0.5
z = 2 → σ(z) = 0.88
z = -2 → σ(z) = 0.12
z → ∞ → σ(z) → 1
z → -∞ → σ(z) → 0
Decision boundary:
σ(z) ≥ 0.5 → Class 1
σ(z) < 0.5 → Class 0
Cost Function
Linear regression uses MSE — but it doesn't work for logistic regression!
Why?
If y=1 and ŷ is large → MSE is small (good)
If y=0 and ŷ is large → MSE is large (bad)
But sigmoid is flat at extremes → gradients are tiny → slow learning
Solution: Binary Cross-Entropy (Log Loss)
L = -[y log(ŷ) + (1-y) log(1-ŷ)]
If y=1: L = -log(ŷ)
ŷ=0.9 → L = 0.11 (good)
ŷ=0.1 → L = 2.30 (bad)
If y=0: L = -log(1-ŷ)
ŷ=0.1 → L = 0.11 (good)
ŷ=0.9 → L = 2.30 (bad)
Python Implementation
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
# Generate data
X, y = make_classification(n_samples=1000, n_features=10,
n_informative=5, random_state=42)
# Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# Fit
model = LogisticRegression()
model.fit(X_train, y_train)
# Predict
y_pred = model.predict(X_test)
y_prob = model.predict_proba(X_test)[:, 1]
# Evaluate
from sklearn.metrics import accuracy_score, roc_auc_score
print(f"Accuracy: {accuracy_score(y_test, y_pred):.3f}")
print(f"AUC-ROC: {roc_auc_score(y_test, y_prob):.3f}")
Decision Boundary
The decision boundary is where σ(z) = 0.5:
w₁x₁ + w₂x₂ + b = 0
This is a LINE in 2D, a PLANE in 3D,
and a HYPERPLANE in higher dimensions.
For nonlinear boundaries:
- Add polynomial features
- Use kernel logistic regression
- Or use neural networks
Multiclass Extension
Binary → Multiclass:
Method 1: One-vs-Rest (OvR)
├─ Train K binary classifiers (one per class)
├─ Each classifier: "Is it class k or not?"
└─ Class with highest probability wins
Method 2: Softmax Regression (Multinomial)
├─ Generalizes sigmoid to multiple classes
├─ softmax(zᵢ) = eᶻⁱ / Σeᶻʲ
└─ Output: probability distribution over classes
Evaluation Metrics
Confusion Matrix:
Predicted
0 1
Actual 0 [ TN FP ]
1 [ FN TP ]
Accuracy: (TP + TN) / Total
Precision: TP / (TP + FP) — of predicted positives, how many correct?
Recall: TP / (TP + FN) — of actual positives, how many found?
F1 Score: 2 × (Precision × Recall) / (Precision + Recall)
AUC-ROC: Area under ROC curve (threshold-independent)
Key Takeaways
- Logistic regression outputs probabilities using the sigmoid function
- Binary cross-entropy is the cost function (not MSE)
- Decision boundary is linear in feature space
- Multiclass: use Softmax regression or One-vs-Rest
- AUC-ROC is the best metric for imbalanced datasets
- Logistic regression is fast, interpretable, and a great baseline
- Add polynomial features for nonlinear boundaries
- Use regularization (L1/L2) to prevent overfitting