Logistic Regression

Regression Analysis

Statistical Foundations of Binary Classification

Logistic regression models the probability of a binary outcome using the log-odds link function. Maximum likelihood estimation, Wald tests, and likelihood ratio tests provide rigorous statistical inference for classification problems.

Medical Diagnosis — Predict disease presence from patient characteristics
Credit Scoring — Estimate default probability for loan applications
Customer Analytics — Model churn likelihood from behavioral features

The sigmoid function maps any linear combination to a valid probability.

Why Not Linear Regression for Binary Data?

Linear regression is inappropriate for binary responses because:

The errors are not normally distributed (they are Bernoulli).
The predicted values can fall outside , which is impossible for probabilities.
The variance is not constant: , which depends on .

Logistic regression solves these problems by modeling the probability through the logistic (sigmoid) function.

The Logistic Model

Odds and Odds Ratios

Goodness of Fit

| Metric | Definition | Interpretation |

|--------|-----------|----------------|

| McFadden's | | 0.2–0.4 is considered good |

| AIC | | Lower is better; penalizes complexity |

| BIC | | Lower is better; stronger penalty than AIC |

| Hosmer–Lemeshow | Compares observed vs. predicted frequencies in deciles | Non-significant indicates good fit |

Logistic Regression — Binary Classification with Statistics