Logistic Regression
âšī¸ Why It Matters
Logistic regression is the baseline classifier and the foundation for neural network classification. It models binary outcomes using the sigmoid function, producing interpretable odds ratios. Every data scientist must understand its coefficients, evaluation metrics, and relationship to more complex models. It is the starting point for any classification task.
Overview
Logistic regression models the probability that a binary outcome given features . Unlike linear regression, it uses the sigmoid function to map the linear predictor to a probability between 0 and 1. The model is linear in the log-odds space: . Coefficients exponentiate to odds ratios (), providing intuitive effect size estimates. A classification threshold (typically 0.5) converts probabilities to binary predictions. The model is fitted via maximum likelihood estimation (MLE), not OLS.
Key Concepts
Logistic Regression Model
Here,
- =Sigmoid function: $1 / (1 + e^{-z})$
- =Model coefficients
- =Input features
Log-Odds (Logit) Link
Here,
- =$P(Y=1|X)$ â probability of class 1
- =Odds of the outcome
Odds Ratio
Here,
- =Coefficient for feature j
- =Multiplicative effect on odds for one-unit increase in x_j
Log-Likelihood
Here,
- =Observed outcome (0 or 1)
- =Predicted probability for observation i
Classification Metrics
| Metric | Formula | Use Case |
|---|---|---|
| Accuracy | Balanced classes | |
| Precision | Cost of false positive high | |
| Recall | Cost of false negative high | |
| F1 Score | Balance precision and recall | |
| AUC-ROC | Area under ROC curve | Threshold-independent evaluation |
Odds Ratio Interpretation
| OR Value | Interpretation |
|---|---|
| OR = 1 | No association |
| OR > 1 | Positive association (increases odds) |
| OR < 1 | Negative association (decreases odds) |
| OR = 2 | Doubles the odds |
| OR = 0.5 | Halves the odds |
Quick Example
đInterpreting Logistic Coefficient
for age in a logistic regression predicting disease.
Odds ratio = . For each year increase in age, the odds of disease multiply by 1.65 (65% increase in odds). The 95% CI for the OR might be [1.2, 2.3], indicating the effect is statistically significant.
đThreshold Trade-Off
A model predicts for an email. With threshold 0.5, it's classified as not spam. But if missing spam is costly (false negative), lower the threshold to 0.3 â catching more spam but also flagging more legitimate emails. The choice of threshold depends on the relative costs of false positives vs. false negatives.
đConfusion Matrix
A classifier predicts 80 correct and 20 incorrect out of 100 samples:
| Predicted Positive | Predicted Negative | |
|---|---|---|
| Actual Positive | TP = 45 | FN = 5 |
| Actual Negative | FP = 15 | TN = 35 |
Accuracy = (45+35)/100 = 80%. Precision = 45/(45+15) = 75%. Recall = 45/(45+5) = 90%. F1 = 2(0.75)(0.9)/(0.75+0.9) = 81.8%.
Key Takeaways
đSummary: Logistic Regression
- Sigmoid Function: Maps any real number to , making it ideal for probability estimation.
- Log-Odds Linear: The model is linear in log-odds space, not probability space. This avoids probabilities outside .
- Odds Ratio: gives the multiplicative effect on odds for a one-unit increase in . The most interpretable output.
- MLE Estimation: Fitted via maximum likelihood, not OLS. The log-likelihood is maximized.
- Threshold: Default 0.5, but can be adjusted to trade off precision vs. recall based on business costs.
- Evaluation: Use accuracy for balanced classes, precision when false positives are costly, recall when false negatives are costly, and F1 for balance. AUC-ROC for threshold-independent evaluation.
- Foundation: Logistic regression is the baseline for binary classification. Neural networks generalize it with hidden layers and non-linear activations.
Deep Dive
For detailed explanations, worked examples, and Python implementations, explore the dedicated statistics lessons:
Logistic Regression
- Logistic Regression Statistics â Maximum likelihood estimation, coefficient interpretation, model diagnostics, and Python implementation
Odds Ratios
- Odds Ratios â Interpreting coefficients as odds ratios, confidence intervals, and practical examples
Related Topics
- Point Estimation â MLE theory underlying logistic regression estimation
- Simple Linear Regression â The continuous-outcome counterpart
- Chi-Square Test of Independence â Tests the same associations that logistic regression models
- Multiple Linear Regression â Extending to multiple predictors