Logistic Regression — Complete Guide

What Is Logistic Regression?

Despite the name, logistic regression is a classification algorithm. It models the probability that an input belongs to a class.

$P(y=1 \mid x) = \sigma(z) = \frac{1}{1 + e^{-z}}$

Where $z = \mathbf{w}^\top \mathbf{x} + b$ is the linear combination.

The Sigmoid Function

$\sigma(z) = \frac{1}{1 + e^{-z}} \in (0, 1)$

z	σ(z)	Interpretation
-5	0.007	Almost certainly class 0
0	0.500	Decision boundary
+5	0.993	Almost certainly class 1

Decision rule: predict class 1 if $\sigma(z) \geq 0.5$ , i.e. when $z \geq 0$ .

Maximum Likelihood Estimation

We learn weights by maximising the likelihood of the observed data. The log-likelihood (negative = loss):

$\mathcal{L}(\mathbf{w}) = -\frac{1}{n}\sum_{i=1}^n \left[y_i \log \hat{p}_i + (1-y_i)\log(1-\hat{p}_i)\right]$

This is binary cross-entropy loss — convex, so gradient descent finds the global minimum.

Gradient:

$\frac{\partial \mathcal{L}}{\partial \mathbf{w}} = \frac{1}{n}\mathbf{X}^\top(\hat{\mathbf{p}} - \mathbf{y})$

Full Python Implementation

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_breast_cancer, make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import (accuracy_score, classification_report,
                             confusion_matrix, roc_auc_score,
                             roc_curve, precision_recall_curve)

# ── 1. Implement from scratch ─────────────────────────────────────────
class LogisticRegressionScratch:
    def __init__(self, lr=0.01, n_iter=1000):
        self.lr = lr; self.n_iter = n_iter; self.losses = []

    def _sigmoid(self, z):
        return 1 / (1 + np.exp(-np.clip(z, -500, 500)))

    def fit(self, X, y):
        n, p = X.shape
        self.w = np.zeros(p); self.b = 0.0
        for _ in range(self.n_iter):
            z = X @ self.w + self.b
            p_hat = self._sigmoid(z)
            loss = -np.mean(y * np.log(p_hat + 1e-9) +
                           (1 - y) * np.log(1 - p_hat + 1e-9))
            self.losses.append(loss)
            dw = X.T @ (p_hat - y) / n
            db = np.mean(p_hat - y)
            self.w -= self.lr * dw
            self.b -= self.lr * db
        return self

    def predict_proba(self, X):
        return self._sigmoid(X @ self.w + self.b)

    def predict(self, X, threshold=0.5):
        return (self.predict_proba(X) >= threshold).astype(int)

# ── 2. Train and evaluate ─────────────────────────────────────────────
cancer = load_breast_cancer()
X = pd.DataFrame(cancer.data, columns=cancer.feature_names)
y = cancer.target

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y)
scaler = StandardScaler()
Xt = scaler.fit_transform(X_train)
Xs = scaler.transform(X_test)

# Scratch model
lr_scratch = LogisticRegressionScratch(lr=0.1, n_iter=500)
lr_scratch.fit(Xt, y_train.values)

# Sklearn model (with L2 regularisation)
lr_sk = LogisticRegression(C=1.0, max_iter=1000, random_state=42)
lr_sk.fit(Xt, y_train)

print("Scratch Accuracy :", accuracy_score(y_test, lr_scratch.predict(Xs)))
print("Sklearn Accuracy :", accuracy_score(y_test, lr_sk.predict(Xs)))
print("\nFull Report:\n", classification_report(y_test, lr_sk.predict(Xs),
      target_names=["Malignant", "Benign"]))

# ── 3. Visualisations ─────────────────────────────────────────────────
fig, axes = plt.subplots(1, 3, figsize=(16, 4))

# Loss curve
axes[0].plot(lr_scratch.losses, color="#3b82f6", linewidth=2)
axes[0].set_title("Training Loss"); axes[0].set_xlabel("Iteration")
axes[0].set_ylabel("Binary Cross-Entropy"); axes[0].grid(True, alpha=0.3)

# ROC curve
fpr, tpr, _ = roc_curve(y_test, lr_sk.predict_proba(Xs)[:,1])
auc = roc_auc_score(y_test, lr_sk.predict_proba(Xs)[:,1])
axes[1].plot(fpr, tpr, color="#10b981", linewidth=2.5, label=f"AUC={auc:.4f}")
axes[1].plot([0,1],[0,1],"k--"); axes[1].set_title("ROC Curve")
axes[1].set_xlabel("FPR"); axes[1].set_ylabel("TPR")
axes[1].legend(); axes[1].grid(True, alpha=0.3)

# Confusion matrix
cm = confusion_matrix(y_test, lr_sk.predict(Xs))
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues", ax=axes[2],
            xticklabels=["Malignant","Benign"],
            yticklabels=["Malignant","Benign"])
axes[2].set_title("Confusion Matrix")

plt.tight_layout(); plt.show()

Regularisation

$\mathcal{L}_{Ridge} = \mathcal{L} + \frac{\lambda}{2}\|\mathbf{w}\|^2 \quad \text{(L2, default)}$

$\mathcal{L}_{Lasso} = \mathcal{L} + \lambda\|\mathbf{w}\|_1 \quad \text{(L1, sparse)}$

In sklearn: C = 1/λ — smaller C = stronger regularisation.

Multiclass: Softmax Regression

$P(y=k \mid x) = \frac{e^{z_k}}{\sum_{j=1}^{K} e^{z_j}}$

Use multi_class="multinomial" in sklearn for true softmax.

Key Takeaways

Logistic regression models probability via the sigmoid; threshold at 0.5 for class labels
Trained by minimising cross-entropy (equivalent to MLE)
L2 regularisation (Ridge) is the default — controls via C=1/λ
AUC-ROC is the key metric when classes are imbalanced
Despite simplicity, logistic regression is a strong baseline for binary classification