← Math|53 of 100
Statistics

Logistic Regression

Master logistic regression, odds ratios, and classification evaluation.

📂 Classification📖 Lesson 53 of 100🎓 Free Course

Advertisement

Logistic Regression

â„šī¸ Why It Matters

Logistic regression is the baseline classifier and the foundation for neural network classification. It models binary outcomes using the sigmoid function, producing interpretable odds ratios. Every data scientist must understand its coefficients, evaluation metrics, and relationship to more complex models. It is the starting point for any classification task.


Overview

Logistic regression models the probability that a binary outcome Y=1Y = 1 given features XX. Unlike linear regression, it uses the sigmoid function ΃(z)=1/(1+e−z)\sigma(z) = 1/(1 + e^{-z}) to map the linear predictor to a probability between 0 and 1. The model is linear in the log-odds space: log⁥(p/(1−p))=β0+β1x1+⋯+βpxp\log(p/(1-p)) = \beta_0 + \beta_1 x_1 + \cdots + \beta_p x_p. Coefficients exponentiate to odds ratios (eβje^{\beta_j}), providing intuitive effect size estimates. A classification threshold (typically 0.5) converts probabilities to binary predictions. The model is fitted via maximum likelihood estimation (MLE), not OLS.


Key Concepts

Logistic Regression Model

P(Y=1âˆŖX)=΃(β0+β1x1+⋯+βpxp)P(Y=1|X) = \sigma(\beta_0 + \beta_1 x_1 + \cdots + \beta_p x_p)

Here,

  • ΃(z)\sigma(z)=Sigmoid function: $1 / (1 + e^{-z})$
  • β0,β1,â€Ļ\beta_0, \beta_1, \ldots=Model coefficients
  • x1,â€Ļ,xpx_1, \ldots, x_p=Input features

Log-Odds (Logit) Link

log⁥(p1−p)=β0+β1x1+⋯+βpxp\log\left(\frac{p}{1-p}\right) = \beta_0 + \beta_1 x_1 + \cdots + \beta_p x_p

Here,

  • pp=$P(Y=1|X)$ — probability of class 1
  • p1−p\frac{p}{1-p}=Odds of the outcome

Odds Ratio

OR=ebetaj\text{OR} = e^{\\beta_j}

Here,

  • βj\beta_j=Coefficient for feature j
  • eβje^{\beta_j}=Multiplicative effect on odds for one-unit increase in x_j

Log-Likelihood

ℓ(β)=∑i=1n[yilog⁡(p^i)+(1−yi)log⁡(1−p^i)]\ell(\beta) = \sum_{i=1}^{n} [y_i \log(\hat{p}_i) + (1-y_i) \log(1-\hat{p}_i)]

Here,

  • yiy_i=Observed outcome (0 or 1)
  • p^i\hat{p}_i=Predicted probability for observation i

Classification Metrics

MetricFormulaUse Case
AccuracyTP+TNTP+TN+FP+FN\frac{TP+TN}{TP+TN+FP+FN}Balanced classes
PrecisionTPTP+FP\frac{TP}{TP+FP}Cost of false positive high
RecallTPTP+FN\frac{TP}{TP+FN}Cost of false negative high
F1 Score2⋅Precision⋅RecallPrecision+Recall\frac{2 \cdot \text{Precision} \cdot \text{Recall}}{\text{Precision} + \text{Recall}}Balance precision and recall
AUC-ROCArea under ROC curveThreshold-independent evaluation

Odds Ratio Interpretation

OR ValueInterpretation
OR = 1No association
OR > 1Positive association (increases odds)
OR < 1Negative association (decreases odds)
OR = 2Doubles the odds
OR = 0.5Halves the odds

Quick Example

📝Interpreting Logistic Coefficient

β^1=0.5\hat{\beta}_1 = 0.5 for age in a logistic regression predicting disease.

Odds ratio = e0.5≈1.65e^{0.5} \approx 1.65. For each year increase in age, the odds of disease multiply by 1.65 (65% increase in odds). The 95% CI for the OR might be [1.2, 2.3], indicating the effect is statistically significant.

📝Threshold Trade-Off

A model predicts P(spam)=0.45P(\text{spam}) = 0.45 for an email. With threshold 0.5, it's classified as not spam. But if missing spam is costly (false negative), lower the threshold to 0.3 — catching more spam but also flagging more legitimate emails. The choice of threshold depends on the relative costs of false positives vs. false negatives.

📝Confusion Matrix

A classifier predicts 80 correct and 20 incorrect out of 100 samples:

Predicted PositivePredicted Negative
Actual PositiveTP = 45FN = 5
Actual NegativeFP = 15TN = 35

Accuracy = (45+35)/100 = 80%. Precision = 45/(45+15) = 75%. Recall = 45/(45+5) = 90%. F1 = 2(0.75)(0.9)/(0.75+0.9) = 81.8%.


Key Takeaways

📋Summary: Logistic Regression

  • Sigmoid Function: Maps any real number to (0,1)(0, 1), making it ideal for probability estimation.
  • Log-Odds Linear: The model is linear in log-odds space, not probability space. This avoids probabilities outside [0,1][0, 1].
  • Odds Ratio: eβje^{\beta_j} gives the multiplicative effect on odds for a one-unit increase in xjx_j. The most interpretable output.
  • MLE Estimation: Fitted via maximum likelihood, not OLS. The log-likelihood is maximized.
  • Threshold: Default 0.5, but can be adjusted to trade off precision vs. recall based on business costs.
  • Evaluation: Use accuracy for balanced classes, precision when false positives are costly, recall when false negatives are costly, and F1 for balance. AUC-ROC for threshold-independent evaluation.
  • Foundation: Logistic regression is the baseline for binary classification. Neural networks generalize it with hidden layers and non-linear activations.

Deep Dive

For detailed explanations, worked examples, and Python implementations, explore the dedicated statistics lessons:

Logistic Regression

Odds Ratios

  • Odds Ratios — Interpreting coefficients as odds ratios, confidence intervals, and practical examples

Related Topics

Lesson Progress53 / 100