Logistic Regression — Complete Guide

Supervised LearningClassificationFree Lesson

Advertisement

What Is Logistic Regression?

Despite the name, logistic regression is a classification algorithm. It models the probability that an input belongs to a class.

P(y=1x)=σ(z)=11+ezP(y=1 \mid x) = \sigma(z) = \frac{1}{1 + e^{-z}}

Where z=wx+bz = \mathbf{w}^\top \mathbf{x} + b is the linear combination.


The Sigmoid Function

σ(z)=11+ez(0,1)\sigma(z) = \frac{1}{1 + e^{-z}} \in (0, 1)

zσ(z)Interpretation
-50.007Almost certainly class 0
00.500Decision boundary
+50.993Almost certainly class 1

Decision rule: predict class 1 if σ(z)0.5\sigma(z) \geq 0.5, i.e. when z0z \geq 0.


Maximum Likelihood Estimation

We learn weights by maximising the likelihood of the observed data. The log-likelihood (negative = loss):

L(w)=1ni=1n[yilogp^i+(1yi)log(1p^i)]\mathcal{L}(\mathbf{w}) = -\frac{1}{n}\sum_{i=1}^n \left[y_i \log \hat{p}_i + (1-y_i)\log(1-\hat{p}_i)\right]

This is binary cross-entropy loss — convex, so gradient descent finds the global minimum.

Gradient:

Lw=1nX(p^y)\frac{\partial \mathcal{L}}{\partial \mathbf{w}} = \frac{1}{n}\mathbf{X}^\top(\hat{\mathbf{p}} - \mathbf{y})


Full Python Implementation

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_breast_cancer, make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import (accuracy_score, classification_report,
                             confusion_matrix, roc_auc_score,
                             roc_curve, precision_recall_curve)

# ── 1. Implement from scratch ─────────────────────────────────────────
class LogisticRegressionScratch:
    def __init__(self, lr=0.01, n_iter=1000):
        self.lr = lr; self.n_iter = n_iter; self.losses = []

    def _sigmoid(self, z):
        return 1 / (1 + np.exp(-np.clip(z, -500, 500)))

    def fit(self, X, y):
        n, p = X.shape
        self.w = np.zeros(p); self.b = 0.0
        for _ in range(self.n_iter):
            z = X @ self.w + self.b
            p_hat = self._sigmoid(z)
            loss = -np.mean(y * np.log(p_hat + 1e-9) +
                           (1 - y) * np.log(1 - p_hat + 1e-9))
            self.losses.append(loss)
            dw = X.T @ (p_hat - y) / n
            db = np.mean(p_hat - y)
            self.w -= self.lr * dw
            self.b -= self.lr * db
        return self

    def predict_proba(self, X):
        return self._sigmoid(X @ self.w + self.b)

    def predict(self, X, threshold=0.5):
        return (self.predict_proba(X) >= threshold).astype(int)

# ── 2. Train and evaluate ─────────────────────────────────────────────
cancer = load_breast_cancer()
X = pd.DataFrame(cancer.data, columns=cancer.feature_names)
y = cancer.target

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y)
scaler = StandardScaler()
Xt = scaler.fit_transform(X_train)
Xs = scaler.transform(X_test)

# Scratch model
lr_scratch = LogisticRegressionScratch(lr=0.1, n_iter=500)
lr_scratch.fit(Xt, y_train.values)

# Sklearn model (with L2 regularisation)
lr_sk = LogisticRegression(C=1.0, max_iter=1000, random_state=42)
lr_sk.fit(Xt, y_train)

print("Scratch Accuracy :", accuracy_score(y_test, lr_scratch.predict(Xs)))
print("Sklearn Accuracy :", accuracy_score(y_test, lr_sk.predict(Xs)))
print("\nFull Report:\n", classification_report(y_test, lr_sk.predict(Xs),
      target_names=["Malignant", "Benign"]))

# ── 3. Visualisations ─────────────────────────────────────────────────
fig, axes = plt.subplots(1, 3, figsize=(16, 4))

# Loss curve
axes[0].plot(lr_scratch.losses, color="#3b82f6", linewidth=2)
axes[0].set_title("Training Loss"); axes[0].set_xlabel("Iteration")
axes[0].set_ylabel("Binary Cross-Entropy"); axes[0].grid(True, alpha=0.3)

# ROC curve
fpr, tpr, _ = roc_curve(y_test, lr_sk.predict_proba(Xs)[:,1])
auc = roc_auc_score(y_test, lr_sk.predict_proba(Xs)[:,1])
axes[1].plot(fpr, tpr, color="#10b981", linewidth=2.5, label=f"AUC={auc:.4f}")
axes[1].plot([0,1],[0,1],"k--"); axes[1].set_title("ROC Curve")
axes[1].set_xlabel("FPR"); axes[1].set_ylabel("TPR")
axes[1].legend(); axes[1].grid(True, alpha=0.3)

# Confusion matrix
cm = confusion_matrix(y_test, lr_sk.predict(Xs))
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues", ax=axes[2],
            xticklabels=["Malignant","Benign"],
            yticklabels=["Malignant","Benign"])
axes[2].set_title("Confusion Matrix")

plt.tight_layout(); plt.show()

Regularisation

LRidge=L+λ2w2(L2, default)\mathcal{L}_{Ridge} = \mathcal{L} + \frac{\lambda}{2}\|\mathbf{w}\|^2 \quad \text{(L2, default)}

LLasso=L+λw1(L1, sparse)\mathcal{L}_{Lasso} = \mathcal{L} + \lambda\|\mathbf{w}\|_1 \quad \text{(L1, sparse)}

In sklearn: C = 1/λsmaller C = stronger regularisation.


Multiclass: Softmax Regression

P(y=kx)=ezkj=1KezjP(y=k \mid x) = \frac{e^{z_k}}{\sum_{j=1}^{K} e^{z_j}}

Use multi_class="multinomial" in sklearn for true softmax.


Key Takeaways

  1. Logistic regression models probability via the sigmoid; threshold at 0.5 for class labels
  2. Trained by minimising cross-entropy (equivalent to MLE)
  3. L2 regularisation (Ridge) is the default — controls via C=1/λ
  4. AUC-ROC is the key metric when classes are imbalanced
  5. Despite simplicity, logistic regression is a strong baseline for binary classification

Advertisement

Need Expert Data Science Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement