What Is Logistic Regression?
Despite the name, logistic regression is a classification algorithm. It models the probability that an input belongs to a class.
Where is the linear combination.
The Sigmoid Function
| z | σ(z) | Interpretation |
|---|---|---|
| -5 | 0.007 | Almost certainly class 0 |
| 0 | 0.500 | Decision boundary |
| +5 | 0.993 | Almost certainly class 1 |
Decision rule: predict class 1 if , i.e. when .
Maximum Likelihood Estimation
We learn weights by maximising the likelihood of the observed data. The log-likelihood (negative = loss):
This is binary cross-entropy loss — convex, so gradient descent finds the global minimum.
Gradient:
Full Python Implementation
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_breast_cancer, make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import (accuracy_score, classification_report,
confusion_matrix, roc_auc_score,
roc_curve, precision_recall_curve)
# ── 1. Implement from scratch ─────────────────────────────────────────
class LogisticRegressionScratch:
def __init__(self, lr=0.01, n_iter=1000):
self.lr = lr; self.n_iter = n_iter; self.losses = []
def _sigmoid(self, z):
return 1 / (1 + np.exp(-np.clip(z, -500, 500)))
def fit(self, X, y):
n, p = X.shape
self.w = np.zeros(p); self.b = 0.0
for _ in range(self.n_iter):
z = X @ self.w + self.b
p_hat = self._sigmoid(z)
loss = -np.mean(y * np.log(p_hat + 1e-9) +
(1 - y) * np.log(1 - p_hat + 1e-9))
self.losses.append(loss)
dw = X.T @ (p_hat - y) / n
db = np.mean(p_hat - y)
self.w -= self.lr * dw
self.b -= self.lr * db
return self
def predict_proba(self, X):
return self._sigmoid(X @ self.w + self.b)
def predict(self, X, threshold=0.5):
return (self.predict_proba(X) >= threshold).astype(int)
# ── 2. Train and evaluate ─────────────────────────────────────────────
cancer = load_breast_cancer()
X = pd.DataFrame(cancer.data, columns=cancer.feature_names)
y = cancer.target
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42, stratify=y)
scaler = StandardScaler()
Xt = scaler.fit_transform(X_train)
Xs = scaler.transform(X_test)
# Scratch model
lr_scratch = LogisticRegressionScratch(lr=0.1, n_iter=500)
lr_scratch.fit(Xt, y_train.values)
# Sklearn model (with L2 regularisation)
lr_sk = LogisticRegression(C=1.0, max_iter=1000, random_state=42)
lr_sk.fit(Xt, y_train)
print("Scratch Accuracy :", accuracy_score(y_test, lr_scratch.predict(Xs)))
print("Sklearn Accuracy :", accuracy_score(y_test, lr_sk.predict(Xs)))
print("\nFull Report:\n", classification_report(y_test, lr_sk.predict(Xs),
target_names=["Malignant", "Benign"]))
# ── 3. Visualisations ─────────────────────────────────────────────────
fig, axes = plt.subplots(1, 3, figsize=(16, 4))
# Loss curve
axes[0].plot(lr_scratch.losses, color="#3b82f6", linewidth=2)
axes[0].set_title("Training Loss"); axes[0].set_xlabel("Iteration")
axes[0].set_ylabel("Binary Cross-Entropy"); axes[0].grid(True, alpha=0.3)
# ROC curve
fpr, tpr, _ = roc_curve(y_test, lr_sk.predict_proba(Xs)[:,1])
auc = roc_auc_score(y_test, lr_sk.predict_proba(Xs)[:,1])
axes[1].plot(fpr, tpr, color="#10b981", linewidth=2.5, label=f"AUC={auc:.4f}")
axes[1].plot([0,1],[0,1],"k--"); axes[1].set_title("ROC Curve")
axes[1].set_xlabel("FPR"); axes[1].set_ylabel("TPR")
axes[1].legend(); axes[1].grid(True, alpha=0.3)
# Confusion matrix
cm = confusion_matrix(y_test, lr_sk.predict(Xs))
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues", ax=axes[2],
xticklabels=["Malignant","Benign"],
yticklabels=["Malignant","Benign"])
axes[2].set_title("Confusion Matrix")
plt.tight_layout(); plt.show()
Regularisation
In sklearn: C = 1/λ — smaller C = stronger regularisation.
Multiclass: Softmax Regression
Use multi_class="multinomial" in sklearn for true softmax.
Key Takeaways
- Logistic regression models probability via the sigmoid; threshold at 0.5 for class labels
- Trained by minimising cross-entropy (equivalent to MLE)
- L2 regularisation (Ridge) is the default — controls via
C=1/λ - AUC-ROC is the key metric when classes are imbalanced
- Despite simplicity, logistic regression is a strong baseline for binary classification