← Math|58 of 100
Statistics

Bayesian Statistics

Master Bayesian inference, conjugate priors, MAP estimation, and hierarchical models.

📂 Bayesian📖 Lesson 58 of 100🎓 Free Course

Advertisement

Bayesian Statistics

â„šī¸ Why It Matters

Bayesian methods quantify uncertainty in parameters, enabling better decision-making under uncertainty. Rather than treating parameters as fixed unknowns (frequentist), Bayesian inference treats them as random variables with distributions. This yields full posterior distributions, credible intervals, and direct probability statements about parameters — invaluable for risk-aware decision-making in healthcare, finance, and autonomous systems.


Overview

Bayesian inference updates prior beliefs about parameters using observed data via Bayes' rule: posterior ∝ likelihood × prior. The prior p(θ)p(\theta) encodes beliefs before seeing data. The likelihood p(DâˆŖÎ¸)p(D|\theta) is the probability of the data given the parameters. The posterior p(Î¸âˆŖD)p(\theta|D) is the updated belief after seeing data. Conjugate priors (e.g., Beta-Binomial, Normal-Normal) yield closed-form posteriors for exact analytical updates. MAP estimation finds the mode of the posterior, equivalent to MLE with regularization. For complex models, MCMC methods (Gibbs sampling, HMC) sample from the posterior distribution numerically.


Key Concepts

Bayes' Rule

p(Î¸âˆŖD)=p(DâˆŖÎ¸)p(θ)p(D)p(\theta|D) = \frac{p(D|\theta)p(\theta)}{p(D)}

Here,

  • p(Î¸âˆŖD)p(\theta|D)=Posterior: updated belief after seeing data
  • p(DâˆŖÎ¸)p(D|\theta)=Likelihood: probability of data given θ
  • p(θ)p(\theta)=Prior: belief before seeing data
  • p(D)p(D)=Evidence (normalizing constant)

MAP Estimator

θ^MAP=arg⁥max⁥θp(DâˆŖÎ¸)p(θ)\hat{\theta}_{MAP} = \arg\max_\theta p(D|\theta)p(\theta)

Here,

  • θ^MAP\hat{\theta}_{MAP}=Maximum a posteriori estimate

Beta-Binomial Conjugate

Prior: θâˆŧBeta(Îą,β)  ⟹  Posterior:Â Î¸âˆŖDâˆŧBeta(Îą+s,β+f)\text{Prior: } \theta \sim \text{Beta}(\alpha, \beta) \implies \text{Posterior: } \theta | D \sim \text{Beta}(\alpha + s, \beta + f)

Here,

  • ss=Number of successes
  • ff=Number of failures

Normal-Normal Conjugate

Posterior mean: Îŧn=΃2Îŧ0+nĪ„2xË‰Īƒ2+nĪ„2\text{Posterior mean: } \mu_n = \frac{\sigma^2 \mu_0 + n \tau^2 \bar{x}}{\sigma^2 + n\tau^2}

Here,

  • Îŧ0\mu_0=Prior mean
  • Ī„2\tau^2=Prior variance (prior strength)
  • ΃2\sigma^2=Data variance
  • nn=Sample size

Posterior Precision

1Ī„n2=1Ī„02+n΃2\frac{1}{\tau_n^2} = \frac{1}{\tau_0^2} + \frac{n}{\sigma^2}

Here,

  • Ī„n2\tau_n^2=Posterior variance
  • Ī„02\tau_0^2=Prior variance

Conjugate Prior Families

LikelihoodPriorPosteriorUse Case
Bernoulli/BinomialBetaBetaProportions, click rates
Normal (known ΃2\sigma^2)NormalNormalMean estimation
PoissonGammaGammaCount data
Normal (unknown Îŧ\mu, ΃2\sigma^2)Normal-Inverse-GammaNormal-Inverse-GammaFull normal model

Prior Strength Effects

Prior StrengthEffect on PosteriorWhen to Use
Weak (large ΄02\tau_0^2)Posterior dominated by dataLarge samples, little prior knowledge
Strong (small ΄02\tau_0^2)Posterior dominated by priorSmall samples, strong prior knowledge
Flat (uniform)Posterior = likelihood (up to constant)Non-informative analysis

Quick Example

📝Beta-Binomial Conjugate

Prior: θâˆŧBeta(2,2)\theta \sim \text{Beta}(2, 2) (centered at 0.5, moderate strength). Data: 7 successes in 10 trials.

Posterior: Beta(2+7,2+3)=Beta(9,5)\text{Beta}(2+7, 2+3) = \text{Beta}(9, 5).

Posterior mean = 9/14≈0.6439/14 \approx 0.643. The prior (centered at 0.5) is pulled toward the data proportion (0.7) but moderated by the prior strength. With more data, the prior's influence diminishes.

📝MAP = MLE + Regularization

With a Gaussian prior θâˆŧN(0,Ī„2)\theta \sim N(0, \tau^2), the MAP estimate is:

θ^MAP=arg⁥max⁥θ[ℓ(θ)−θ22Ī„2]\hat{\theta}_{MAP} = \arg\max_\theta [\ell(\theta) - \frac{\theta^2}{2\tau^2}]

This is equivalent to MLE with L2 regularization (Ridge regression). The prior variance ΄2\tau^2 controls the regularization strength.


Key Takeaways

📋Summary: Bayesian Statistics

  • Bayes' Rule: Posterior ∝ Likelihood × Prior. Updates beliefs systematically as data accumulates.
  • Conjugate Priors: Beta-Binomial, Normal-Normal, Gamma-Poisson yield closed-form posteriors. Convenient for exact inference.
  • MAP = MLE + Regularization: MAP estimation with a Gaussian prior is equivalent to L2-regularized MLE.
  • Prior Choice: With little data, the prior dominates. Use weakly informative priors to regularize without biasing.
  • Posterior Mean: Under squared-error loss, E[Î¸âˆŖD]E[\theta|D] is the Bayes-optimal point estimate.
  • Credible Intervals: Unlike confidence intervals, a 95% credible interval means "95% probability θ\theta is in this interval." Direct interpretation.
  • MCMC: For complex models without conjugate priors, use Markov Chain Monte Carlo (Gibbs, HMC) to sample from the posterior.
  • Prior Sensitivity: Always check how sensitive results are to prior choice — especially with small samples.

Deep Dive

For detailed explanations, worked examples, and Python implementations, explore the dedicated statistics lessons:

Bayesian Regression

  • Bayesian Regression — Full Bayesian treatment of regression with posterior distributions over coefficients

Hierarchical Models

MCMC Diagnostics

  • MCMC Diagnostics — Convergence checks, trace plots, effective sample size, R^\hat{R} statistic, and autocorrelation

Related Topics

Lesson Progress58 / 100