Maximum Likelihood Estimation
ℹ️ Why It Matters
MLE is the most common method for fitting statistical models. It underlies logistic regression, neural networks, and most machine learning. Understanding MLE gives you the theoretical foundation for why loss functions work, how parameters are estimated, and what guarantees exist about estimator quality. It is the bridge between probability theory and practical model fitting.
Overview
Given data from a distribution , the maximum likelihood estimator finds the parameter value that maximizes the probability of observing the data. The likelihood function is , and we maximize it (or equivalently minimize the negative log-likelihood). For the Normal distribution, MLEs have closed forms: and (biased — uses not ). Fisher information measures how much each observation tells us about , and the Cramér-Rao bound sets a floor on estimator variance: no unbiased estimator can have variance less than .
Key Concepts
MLE Estimator
Here,
- =Probability density/mass function of x_i given parameter θ
- =Product over all n observations
- =The parameter value that maximizes the likelihood
Log-Likelihood
Here,
- =Log-likelihood (converts products to sums)
Normal Distribution MLE
Here,
- =MLE of the mean (unbiased)
- =MLE of the variance (biased — uses n, not n-1)
Fisher Information
Here,
- =Fisher information per observation
Cramér-Rao Lower Bound
Here,
- =Fisher information
Score Function
Here,
- =Score function; set to 0 to find MLE
MLE for Common Distributions
| Distribution | Parameter | MLE | Notes |
|---|---|---|---|
| Normal | Unbiased | ||
| Normal | Biased (uses not ) | ||
| Poisson | Closed-form | ||
| Bernoulli | Closed-form | ||
| Exponential | Closed-form |
Quick Example
📝MLE for Poisson Distribution
Data: from Poisson(). Log-likelihood:
Setting :
The MLE of is the sample mean — intuitive because the Poisson mean equals .
📝Numerical MLE
For complex models without closed-form MLEs, minimize the negative log-likelihood numerically:
from scipy.optimize import minimize
neg_log_lik = lambda params: -np.sum(stats.norm.logpdf(data, params[0], params[1]))
result = minimize(neg_log_lik, [0, 1], bounds=[(None,None), (0.01, None)])
Key Takeaways
📋Summary: Maximum Likelihood Estimation
- Definition: maximizes the probability of observing the data. The most widely used estimation method.
- Log-Likelihood: Convert products to sums via . Preserves argmax, simplifies optimization.
- Closed-Form MLEs: Normal (, biased ), Poisson (), Bernoulli ().
- Fisher Information: Measures parameter identifiability. Higher → tighter estimates. Cramér-Rao: .
- Score Function: Set to find the MLE analytically.
- Numerical Optimization: When no closed form exists, minimize negative log-likelihood with
scipy.optimize.minimize. - Connection to ML: Most ML loss functions are negative log-likelihoods (cross-entropy, MSE). MLE unifies statistical and machine learning estimation.
Deep Dive
For detailed explanations, worked examples, and Python implementations, explore the dedicated statistics lessons:
Point Estimation
- Point Estimation — MLE, method of moments, and properties of estimators
Properties of Estimators
- Properties of Estimators — Unbiasedness, consistency, efficiency, sufficiency, and the Cramér-Rao bound
Related Topics
- Logistic Regression Statistics — MLE applied to logistic regression
- Confidence Intervals — MLE-based Wald intervals
- Simple Linear Regression — OLS as a special case of MLE under Normal errors
- Bayesian Regression — Bayesian alternative to MLE estimation