← Math|38 of 100
Probability

Probability Distributions

Master all major probability distributions: Normal, Exponential, Gamma, Beta, Chi-Square, t, and F distributions with formulas, relationships, and practical applications.

📂 Distributions📖 Lesson 38 of 100🎓 Free Course

Advertisement

Why Probability Distributions Matter

💡 Why It Matters

Every real-world phenomenon has an underlying probability distribution. Machine learning models assume specific distributions — linear regression assumes normally distributed errors, Naive Bayes assumes feature independence within distributions, and Bayesian methods use conjugate priors. Choosing the wrong distribution leads to invalid inferences and poor predictions. Mastering distributions is not academic — it determines whether your model works or fails.

A probability distribution is a mathematical function that describes the likelihood of all possible outcomes of a random variable. Distributions come in two families:

  • Discrete: Countable outcomes (Bernoulli, Binomial, Poisson, Geometric)
  • Continuous: Uncountable outcomes over an interval (Normal, Exponential, Gamma, Beta, Chi-Square, t, F)

Each distribution is defined by its probability density function (PDF) or probability mass function (PMF), its cumulative distribution function (CDF), and a set of parameters that shape its behavior.


Normal Distribution

Normal (Gaussian) Distribution PDF

f(x)=1σ2πexp((xμ)22σ2)f(x) = \frac{1}{\sigma\sqrt{2\pi}} \exp\left(-\frac{(x - \mu)^2}{2\sigma^2}\right)

Here,

  • μ\mu=Mean — controls the location of the center
  • σ2\sigma^2=Variance — controls the spread
  • σ\sigma=Standard deviation
  • xx=Value at which density is evaluated

The Normal distribution is the cornerstone of statistics. Its importance stems from the Central Limit Theorem: the sum of many independent random variables tends toward a Normal distribution, regardless of the original distribution.

ℹ️ The Empirical Rule (68-95-99.7 Rule)

For a Normal distribution:

  • 68.27% of data falls within μ±1σ\mu \pm 1\sigma
  • 95.45% of data falls within μ±2σ\mu \pm 2\sigma
  • 99.73% of data falls within μ±3σ\mu \pm 3\sigma

This rule provides quick estimates without consulting tables.

ThStandard Normal Distribution

Any Normal random variable XN(μ,σ2)X \sim N(\mu, \sigma^2) can be transformed to the standard Normal ZN(0,1)Z \sim N(0, 1) via:

Z=XμσZ = \frac{X - \mu}{\sigma}

The CDF of the standard Normal is denoted Φ(z)\Phi(z), and P(aXb)=Φ(bμσ)Φ(aμσ)P(a \leq X \leq b) = \Phi\left(\frac{b - \mu}{\sigma}\right) - \Phi\left(\frac{a - \mu}{\sigma}\right).

DfKey Properties of the Normal Distribution


Exponential Distribution

Exponential Distribution PDF

f(x)=λeλx,x0f(x) = \lambda e^{-\lambda x}, \quad x \geq 0

Here,

  • λ\lambda=Rate parameter — events per unit time
  • 1/λ1/\lambda=Mean waiting time
  • xx=Time or distance until the event

The Exponential distribution models the waiting time between consecutive events in a Poisson process. It is the continuous analogue of the Geometric distribution.

ℹ️ Memoryless Property

The Exponential distribution is the only continuous distribution with the memoryless property:

P(X>s+tX>s)=P(X>t)P(X > s + t \mid X > s) = P(X > t)

This means the probability of waiting an additional tt time units does not depend on how long you have already waited. This property makes it ideal for modeling random arrivals.

DfKey Properties


Gamma Distribution

Gamma Distribution PDF

f(x)=βαΓ(α)xα1eβx,x>0f(x) = \frac{\beta^\alpha}{\Gamma(\alpha)} x^{\alpha - 1} e^{-\beta x}, \quad x > 0

Here,

  • α\alpha=Shape parameter — controls the form of the distribution
  • β\beta=Rate parameter (inverse scale)
  • Γ(α)\Gamma(\alpha)=Gamma function: (α-1)! for integer α
  • xx=Positive continuous value

The Gamma distribution generalizes the Exponential distribution. While the Exponential models the waiting time for one event, the Gamma models the waiting time for the α-th event in a Poisson process.

💡 Relationship to Other Distributions

  • When α=1\alpha = 1: Gamma reduces to the Exponential distribution
  • When α=n/2\alpha = n/2 and β=1/2\beta = 1/2: Gamma becomes the Chi-Square distribution with nn degrees of freedom
  • The Gamma distribution is the conjugate prior for the Poisson likelihood in Bayesian inference

DfKey Properties


Beta Distribution

Beta Distribution PDF

f(x)=xα1(1x)β1B(α,β),0x1f(x) = \frac{x^{\alpha - 1}(1 - x)^{\beta - 1}}{B(\alpha, \beta)}, \quad 0 \leq x \leq 1

Here,

  • α\alpha=Shape parameter 1 — pulls the distribution toward 1
  • β\beta=Shape parameter 2 — pulls the distribution toward 0
  • B(α,β)B(\alpha, \beta)=Beta function: B(α,β) = Γ(α)Γ(β)/Γ(α+β)
  • xx=Value in [0, 1]

The Beta distribution is defined on the interval [0,1][0, 1], making it the natural choice for modeling probabilities and proportions. It is the conjugate prior for the Bernoulli, Binomial, and Negative Binomial likelihoods.

ℹ️ Interpreting α and β as Prior Counts

Think of α1\alpha - 1 as the number of observed successes and β1\beta - 1 as the number of observed failures before seeing any data. For example:

  • Beta(1,1)\text{Beta}(1, 1) = Uniform (no prior information)
  • Beta(5,3)\text{Beta}(5, 3) = centered around 5/8 ≈ 0.625 (moderate confidence)
  • Beta(50,30)\text{Beta}(50, 30) = tightly concentrated around 0.625 (high confidence)

DfKey Properties


Chi-Square Distribution

Chi-Square Distribution PDF

f(x)=xk/21ex/22k/2Γ(k/2),x>0f(x) = \frac{x^{k/2 - 1} e^{-x/2}}{2^{k/2} \Gamma(k/2)}, \quad x > 0

Here,

  • kk=Degrees of freedom — a positive integer
  • Γ(k/2)\Gamma(k/2)=Gamma function evaluated at k/2

The Chi-Square distribution is a special case of the Gamma distribution with shape α=k/2\alpha = k/2 and rate β=1/2\beta = 1/2. It arises naturally as the distribution of sums of squared standard Normal random variables.

ThOrigin of Chi-Square

If Z1,Z2,,ZkZ_1, Z_2, \ldots, Z_k are independent standard Normal random variables, then:

X=Z12+Z22++Zk2χ2(k)X = Z_1^2 + Z_2^2 + \cdots + Z_k^2 \sim \chi^2(k)

This result is fundamental to hypothesis testing, including the chi-square test for goodness of fit and tests of independence.

DfKey Properties


t-Distribution (Student's t)

t-Distribution PDF

f(x)=Γ(ν+12)νπΓ(ν2)(1+x2ν)(ν+1)/2f(x) = \frac{\Gamma\left(\frac{\nu+1}{2}\right)}{\sqrt{\nu\pi}\,\Gamma\left(\frac{\nu}{2}\right)} \left(1 + \frac{x^2}{\nu}\right)^{-(\nu+1)/2}

Here,

  • ν\nu=Degrees of freedom — controls tail heaviness
  • xx=Value on the real line
  • Γ\Gamma=Gamma function

The t-distribution arises when estimating the mean of a normally distributed population with an unknown variance estimated from a small sample. It has heavier tails than the Normal, reflecting the additional uncertainty from estimating σ\sigma.

ℹ️ Convergence to Normal

As the degrees of freedom ν\nu \to \infty, the t-distribution converges to the standard Normal N(0,1)N(0, 1). For ν30\nu \geq 30, the t-distribution is very close to Normal. For small ν\nu, the tails are substantially heavier, meaning extreme values are more likely.

ThDefinition via Ratio

If ZN(0,1)Z \sim N(0,1) and Vχ2(ν)V \sim \chi^2(\nu) are independent, then:

T=ZV/νt(ν)T = \frac{Z}{\sqrt{V/\nu}} \sim t(\nu)

This ratio form explains why the t-distribution has heavier tails — the denominator introduces additional variability.

DfKey Properties


F-Distribution

F-Distribution PDF

f(x)=(d1x)d1d2d2(d1x+d2)d1+d2xB(d12,d22),x>0f(x) = \frac{\sqrt{\frac{(d_1 x)^{d_1} d_2^{d_2}}{(d_1 x + d_2)^{d_1+d_2}}}}{x \, B\left(\frac{d_1}{2}, \frac{d_2}{2}\right)}, \quad x > 0

Here,

  • d1d_1=Numerator degrees of freedom
  • d2d_2=Denominator degrees of freedom
  • BB=Beta function

The F-distribution is the ratio of two independent Chi-Square random variables, each divided by its degrees of freedom. It is the foundation of ANOVA (Analysis of Variance) and F-tests for comparing model variances.

ThOrigin of F-Distribution

If Uχ2(d1)U \sim \chi^2(d_1) and Vχ2(d2)V \sim \chi^2(d_2) are independent, then:

F=U/d1V/d2F(d1,d2)F = \frac{U/d_1}{V/d_2} \sim F(d_1, d_2)

The connection to the t-distribution: if Tt(ν)T \sim t(\nu), then T2F(1,ν)T^2 \sim F(1, \nu).

DfKey Properties


Distribution Relationships

📋How Distributions Connect

RelationshipFormulaContext
Normal → Standard NormalZ=(Xμ)/σZ = (X - \mu)/\sigmaStandardization
Exponential → GammaGamma(1, λ) = Exp(λ)Single vs. multiple waiting times
Gamma → Chi-SquareGamma(k/2, 1/2) = χ²(k)Sum of squared Normals
Chi-Square → tT=Z/χ2/kT = Z / \sqrt{\chi^2/k}Small-sample inference
t → Normalt(ν)N(0,1)t(\nu) \to N(0,1) as ν\nu \to \inftyLarge-sample approximation
t² → Ft2(ν)=F(1,ν)t^2(\nu) = F(1, \nu)Equivalence of tests
F → BetaVia transformation of FDistributional identity
Binomial → Normalnp>5np > 5, n(1p)>5n(1-p) > 5CLT approximation
Poisson → Normalλ>20\lambda > 20CLT approximation
Bernoulli → BetaBeta is conjugate priorBayesian updating

💡 Hierarchy of Distributions

The Chi-Square is a special case of the Gamma. The t-distribution is built from Normal and Chi-Square. The F-distribution is built from two Chi-Squares. Understanding this hierarchy helps you remember formulas and choose the right distribution for your problem.


Python Implementation

📝Using scipy.stats

All major distributions are available in scipy.stats. Each distribution object supports methods for PDF/PMF, CDF, PPF (quantile function), random sampling, and moment calculation.

import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

# --- Normal Distribution ---
mu, sigma = 0, 1
normal = stats.norm(mu, sigma)

print(f"Normal PDF at 0: {normal.pdf(0):.4f}")           # 0.3989
print(f"Normal CDF at 1.96: {normal.cdf(1.96):.4f}")     # 0.9750
print(f"Normal PPF at 0.975: {normal.ppf(0.975):.4f}")   # 1.9600
print(f"Normal mean: {normal.mean():.4f}")                # 0.0000
print(f"Normal variance: {normal.var():.4f}")             # 1.0000
samples_normal = normal.rvs(size=10000, random_state=42)

# --- Exponential Distribution ---
lam = 2.0
exponential = stats.expon(scale=1/lam)

print(f"Exp PDF at 0.5: {exponential.pdf(0.5):.4f}")     # 0.7358
print(f"Exp mean: {exponential.mean():.4f}")              # 0.5000
print(f"Exp variance: {exponential.var():.4f}")           # 0.2500
samples_exp = exponential.rvs(size=10000, random_state=42)

# --- Gamma Distribution ---
alpha, beta_param = 3.0, 2.0
gamma = stats.gamma(a=alpha, scale=1/beta_param)

print(f"Gamma mean: {gamma.mean():.4f}")                  # 1.5000
print(f"Gamma variance: {gamma.var():.4f}")               # 0.7500
samples_gamma = gamma.rvs(size=10000, random_state=42)

# --- Beta Distribution ---
alpha_b, beta_b = 5.0, 3.0
beta = stats.beta(alpha_b, beta_b)

print(f"Beta mean: {beta.mean():.4f}")                    # 0.6250
print(f"Beta variance: {beta.var():.4f}")                 # 0.0268
samples_beta = beta.rvs(size=10000, random_state=42)

# --- Chi-Square Distribution ---
k = 5
chi2 = stats.chi2(df=k)

print(f"Chi2 mean: {chi2.mean():.4f}")                    # 5.0000
print(f"Chi2 variance: {chi2.var():.4f}")                 # 10.0000
samples_chi2 = chi2.rvs(size=10000, random_state=42)

# --- t-Distribution ---
nu = 5
t_dist = stats.t(df=nu)

print(f"t mean: {t_dist.mean():.4f}")                     # 0.0000
print(f"t variance: {t_dist.var():.4f}")                  # 1.6667
print(f"t 97.5th percentile: {t_dist.ppf(0.975):.4f}")   # 2.5706
samples_t = t_dist.rvs(size=10000, random_state=42)

# --- F-Distribution ---
d1, d2 = 5, 10
f_dist = stats.f(dfn=d1, dfd=d2)

print(f"F mean: {f_dist.mean():.4f}")                     # 1.2500
print(f"F variance: {f_dist.var():.4f}")                  # 0.9375
samples_f = f_dist.rvs(size=10000, random_state=42)

# --- Visualization ---
fig, axes = plt.subplots(2, 4, figsize=(16, 8))
dists = [
    (samples_normal, 'Normal(0,1)', -4, 4),
    (samples_exp, 'Exponential(2)', 0, 3),
    (samples_gamma, 'Gamma(3,2)', 0, 5),
    (samples_beta, 'Beta(5,3)', 0, 1),
    (samples_chi2, 'Chi-Square(5)', 0, 15),
    (samples_t, 't(5)', -5, 5),
    (samples_f, 'F(5,10)', 0, 5),
]

for ax, (data, title, lo, hi) in zip(axes.flat, dists):
    ax.hist(data, bins=50, density=True, alpha=0.7, edgecolor='black')
    ax.set_title(title)
    ax.set_xlim(lo, hi)

axes.flat[-1].axis('off')
plt.tight_layout()
plt.savefig('distributions.png', dpi=150)
plt.show()
# --- Hypothesis Testing Examples ---
# t-test: comparing two sample means
group_a = np.random.normal(loc=100, scale=15, size=50)
group_b = np.random.normal(loc=105, scale=15, size=50)
t_stat, p_value = stats.ttest_ind(group_a, group_b)
print(f"t-statistic: {t_stat:.4f}, p-value: {p_value:.4f}")

# F-test: comparing two variances
f_stat = np.var(group_a, ddof=1) / np.var(group_b, ddof=1)
f_p_value = 1 - stats.f.cdf(f_stat, len(group_a)-1, len(group_b)-1)
print(f"F-statistic: {f_stat:.4f}, p-value: {f_p_value:.4f}")

# Chi-square test: goodness of fit
observed = np.array([30, 50, 20])
expected = np.array([25, 50, 25])
chi2_stat, chi2_p = stats.chisquare(observed, f_exp=expected)
print(f"Chi2-statistic: {chi2_stat:.4f}, p-value: {chi2_p:.4f}")

Applications in AI and Machine Learning

Gaussian Processes

ℹ️ Gaussian Processes

A Gaussian Process (GP) defines a distribution over functions. Any finite set of function values follows a multivariate Normal distribution. GPs are used for regression, classification, and hyperparameter optimization (Bayesian optimization) because they provide uncertainty estimates alongside predictions.

The GP prior is f(x)GP(m(x),k(x,x))f(\mathbf{x}) \sim \mathcal{GP}(m(\mathbf{x}), k(\mathbf{x}, \mathbf{x}')), where kk is a kernel function. Predictions at new points are obtained by conditioning on observed data — all operations remain in the Normal distribution family.

Reparameterization Trick

ThReparameterization Trick (VAEs)

In Variational Autoencoders, we need to backpropagate through a stochastic sampling step. The reparameterization trick writes:

z=μ+σϵ,ϵN(0,I)z = \mu + \sigma \odot \epsilon, \quad \epsilon \sim N(0, I)

This converts implicit sampling into an explicit, differentiable transformation, enabling gradient-based optimization of the ELBO (Evidence Lower Bound). Without this trick, the non-differentiable sampling operation blocks gradient flow.

Other Applications

📋Distribution Applications in ML

ApplicationDistribution UsedWhy
Linear regression errorsNormalCLT justification, analytical tractability
Bayesian linear regressionNormal prior + Normal likelihoodConjugate pair → closed-form posterior
Naive Bayes (continuous)Normal per featureSimple, fast, surprisingly effective
Multinomial Naive BayesMultinomial / DirichletText classification (bag of words)
Variational inferenceNormal (VAE latent space)Reparameterization trick enables backprop
Gaussian Mixture ModelsNormal componentsDensity estimation, clustering
Thompson SamplingBeta posteriorBandit problems, A/B testing
Survival analysisExponential / WeibullTime-to-event modeling
Reinforcement learningCategorical / DirichletPolicy distributions, posterior over rewards
Generative modelsNormal (diffusion processes)Score matching, denoising

Common Mistakes

💡 Learn from Others' Errors

The following table captures frequent mistakes practitioners make when working with probability distributions. Avoiding these errors will save you debugging time and prevent incorrect conclusions.

MistakeWhy It Is WrongCorrect Approach
Assuming all data is NormalReal data is often skewed or heavy-tailedCheck with Q-Q plots, Shapiro-Wilk test
Using Normal for probabilities bounded in [0,1]Normal assigns probability outside [0,1]Use Beta distribution
Confusing rate and scale parametersf(x)=λeλxf(x) = \lambda e^{-\lambda x} vs f(x)=1βex/βf(x) = \frac{1}{\beta}e^{-x/\beta}Always check your library's parameterization
Ignoring degrees of freedom in t-testsSmall samples need t-distribution, not NormalUse t for n<30n < 30
Using z-test when σ\sigma is unknownZ-test requires known population varianceUse t-test instead
Assuming Chi-Square approximation for small expected countsChi-square test requires expected counts ≥ 5Use Fisher's exact test for small samples
Forgetting that F-test is sensitive to non-normalityF-test assumes normally distributed dataCheck assumptions or use robust alternatives
Using mean and variance for skewed distributionsMean is misleading for skewed dataUse median and IQR
Not distinguishing PMF from PDFDiscrete distributions use PMF, continuous use PDFCheck whether your variable is discrete or continuous
Over-interpreting p-valuesp < 0.05 does not mean the effect is large or importantReport effect sizes and confidence intervals

Interview Questions

📝Interview Question 1

Q: Explain the difference between the Normal and t-distribution. When would you use each?

A: The t-distribution has heavier tails than the Normal, reflecting additional uncertainty from estimating the population variance with a sample variance. Use the Normal when the population variance is known or the sample size is large (n30n \geq 30). Use the t-distribution when the population variance is unknown and estimated from a small sample. As degrees of freedom increase, the t-distribution converges to the Normal.

📝Interview Question 2

Q: Why is the Beta distribution useful in Bayesian statistics?

A: The Beta distribution is the conjugate prior for the Binomial (and Bernoulli) likelihood. This means if the prior is Beta(α,β)\text{Beta}(\alpha, \beta) and the likelihood is Binomial, the posterior is also a Beta distribution: Beta(α+s,β+ns)\text{Beta}(\alpha + s, \beta + n - s) where ss is the number of successes. This makes Bayesian updating analytically tractable — no numerical integration needed. The parameters α\alpha and β\beta can be interpreted as prior pseudo-counts of successes and failures.

📝Interview Question 3

Q: What is the Central Limit Theorem and why does it matter for the Normal distribution?

A: The CLT states that the sample mean of nn independent, identically distributed random variables with finite mean μ\mu and variance σ2\sigma^2 converges in distribution to N(μ,σ2/n)N(\mu, \sigma^2/n) as nn \to \infty, regardless of the original distribution. This is why the Normal distribution appears everywhere — sums and averages of many small effects tend toward Normality. It justifies using Normal-based confidence intervals and hypothesis tests even when the underlying data is not Normal, provided the sample size is large enough.

📝Interview Question 4

Q: You are building a model to predict click-through rates. Which distribution should you use for the target variable and why?

A: Click-through rates are proportions bounded in [0,1][0, 1]. The Beta distribution is the natural choice because it is defined on [0,1][0, 1] and can model various shapes (uniform, skewed, concentrated). Alternatively, if modeling individual binary clicks, use Bernoulli for each impression, or Binomial for the count of clicks out of nn impressions. For a Bayesian approach, the Beta-Binomial model provides a principled framework with uncertainty quantification.

📝Interview Question 5

Q: Explain the reparameterization trick in Variational Autoencoders.

A: In a VAE, the latent variable zz is sampled from an approximate posterior q(zx)=N(μ(x),σ2(x)I)q(z|x) = N(\mu(x), \sigma^2(x)I). Direct sampling is non-differentiable, blocking gradient flow. The reparameterization trick rewrites z=μ(x)+σ(x)ϵz = \mu(x) + \sigma(x) \odot \epsilon where ϵN(0,I)\epsilon \sim N(0, I) is an external noise source. Now zz is a deterministic, differentiable function of μ\mu and σ\sigma, and gradients can flow through to the encoder. The noise enters from outside the computational graph.

📝Interview Question 6

Q: What happens if you use a z-test instead of a t-test for a small sample with unknown variance?

A: The z-test assumes the population variance σ2\sigma^2 is known. When it is unknown and estimated from a small sample (n<30n < 30), the sample variance underestimates the true variability, making the z-test overconfident. This leads to inflated Type I error rates — you reject the null hypothesis more often than you should. The t-distribution accounts for this extra uncertainty by having heavier tails, producing wider confidence intervals and more conservative p-values. Always use the t-test when σ\sigma is unknown.


Practice Problems

📝Problem 1: Exponential Waiting Times

At a help desk, calls arrive according to a Poisson process with rate λ=4\lambda = 4 calls per hour. What is the probability that the time between two consecutive calls exceeds 30 minutes?

💡Solution

The waiting time between calls follows an Exponential distribution with λ=4\lambda = 4 per hour. We need P(X>0.5)P(X > 0.5) where XExp(4)X \sim \text{Exp}(4).

P(X>0.5)=e4×0.5=e20.1353P(X > 0.5) = e^{-4 \times 0.5} = e^{-2} \approx 0.1353

There is approximately a 13.5% chance that the gap between calls exceeds 30 minutes.

📝Problem 2: Beta Posterior Updating

You observe 8 successes and 2 failures in 10 Bernoulli trials. Starting with a Beta(2,2)\text{Beta}(2, 2) prior, what is the posterior distribution? What is the posterior mean?

💡Solution

Since Beta is conjugate to Bernoulli, the posterior is:

Beta(2+8,2+2)=Beta(10,4)\text{Beta}(2 + 8, 2 + 2) = \text{Beta}(10, 4)

The posterior mean is:

αα+β=1010+4=10140.7143\frac{\alpha}{\alpha + \beta} = \frac{10}{10 + 4} = \frac{10}{14} \approx 0.7143

The prior was uniform-like (Beta(2,2) has mean 0.5). After observing 80% successes, the posterior mean shifts to 0.7143 — a weighted compromise between the prior belief and the observed data.

📝Problem 3: Chi-Square Test for Fairness

A die is rolled 120 times with the following results: {1: 18, 2: 25, 3: 15, 4: 23, 5: 22, 6: 17}. Test whether the die is fair at the 5% significance level.

💡Solution

If the die is fair, each face should appear with probability 1/6, giving expected count 120/6=20120/6 = 20 for each face.

χ2=i=16(OiEi)2Ei=(1820)220+(2520)220+(1520)220+(2320)220+(2220)220+(1720)220\chi^2 = \sum_{i=1}^{6} \frac{(O_i - E_i)^2}{E_i} = \frac{(18-20)^2}{20} + \frac{(25-20)^2}{20} + \frac{(15-20)^2}{20} + \frac{(23-20)^2}{20} + \frac{(22-20)^2}{20} + \frac{(17-20)^2}{20}
=4+25+25+9+4+920=7620=3.8= \frac{4 + 25 + 25 + 9 + 4 + 9}{20} = \frac{76}{20} = 3.8

Degrees of freedom: 61=56 - 1 = 5. The critical value at α=0.05\alpha = 0.05 is χ0.05,52=11.07\chi^2_{0.05, 5} = 11.07.

Since 3.8<11.073.8 < 11.07, we fail to reject the null hypothesis. The die appears to be fair.

📝Problem 4: Gamma as Sum of Exponentials

If X1,X2,X3X_1, X_2, X_3 are independent with XiExp(2)X_i \sim \text{Exp}(2), what is the distribution of Y=X1+X2+X3Y = X_1 + X_2 + X_3? Find E[Y]E[Y] and Var(Y)\text{Var}(Y).

💡Solution

The sum of nn independent Exponential(λ\lambda) random variables follows a Gamma distribution:

Y=X1+X2+X3Gamma(3,2)Y = X_1 + X_2 + X_3 \sim \text{Gamma}(3, 2)

where α=3\alpha = 3 (shape = number of exponentials) and β=2\beta = 2 (rate).

E[Y]=αβ=32=1.5E[Y] = \frac{\alpha}{\beta} = \frac{3}{2} = 1.5
Var(Y)=αβ2=34=0.75\text{Var}(Y) = \frac{\alpha}{\beta^2} = \frac{3}{4} = 0.75

This result is used in reliability engineering: if three identical components each have exponentially distributed lifetimes, the total system lifetime follows a Gamma distribution.

📝Problem 5: t-Distribution Confidence Interval

A sample of 16 observations from a Normal population yields xˉ=52\bar{x} = 52 and s=8s = 8. Construct a 95% confidence interval for the population mean μ\mu.

💡Solution

Since σ\sigma is unknown and n=16<30n = 16 < 30, we use the t-distribution with df=15df = 15 degrees of freedom.

The critical value is t0.025,15=2.131t_{0.025, 15} = 2.131 (from t-table).

The confidence interval is:

xˉ±tα/2,dfsn=52±2.131816=52±2.1312=52±4.262\bar{x} \pm t_{\alpha/2, df} \cdot \frac{s}{\sqrt{n}} = 52 \pm 2.131 \cdot \frac{8}{\sqrt{16}} = 52 \pm 2.131 \cdot 2 = 52 \pm 4.262
\text{95% CI: } (47.738, 56.262)

If we had incorrectly used the z-distribution (z0.025=1.96z_{0.025} = 1.96), the interval would be 52±3.92=(48.08,55.92)52 \pm 3.92 = (48.08, 55.92) — narrower and overconfident.


Quick Reference

📋Distribution Quick Reference Table

DistributionPDF/PMFMeanVarianceSupportKey Use Case
Normal1σ2πe(xμ)2/2σ2\frac{1}{\sigma\sqrt{2\pi}}e^{-(x-\mu)^2/2\sigma^2}μ\muσ2\sigma^2(,)(-\infty, \infty)Continuous measurements, errors
Exponentialλeλx\lambda e^{-\lambda x}1/λ1/\lambda1/λ21/\lambda^2[0,)[0, \infty)Waiting times, time between events
GammaβαΓ(α)xα1eβx\frac{\beta^\alpha}{\Gamma(\alpha)}x^{\alpha-1}e^{-\beta x}α/β\alpha/\betaα/β2\alpha/\beta^2(0,)(0, \infty)Sum of waiting times, rainfall
Betaxα1(1x)β1B(α,β)\frac{x^{\alpha-1}(1-x)^{\beta-1}}{B(\alpha,\beta)}αα+β\frac{\alpha}{\alpha+\beta}αβ(α+β)2(α+β+1)\frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)}[0,1][0, 1]Probabilities, proportions
Chi-Squarexk/21ex/22k/2Γ(k/2)\frac{x^{k/2-1}e^{-x/2}}{2^{k/2}\Gamma(k/2)}kk2k2k(0,)(0, \infty)Goodness of fit, variance tests
t(1+x2/ν)(ν+1)/2\propto (1+x^2/\nu)^{-(\nu+1)/2}00 (ν>1\nu>1)ν/(ν2)\nu/(\nu-2) (ν>2\nu>2)(,)(-\infty, \infty)Small-sample mean inference
F(d1x)d1d2d2(d1x+d2)d1+d2\propto \frac{(d_1 x)^{d_1} d_2^{d_2}}{(d_1 x+d_2)^{d_1+d_2}}d2d22\frac{d_2}{d_2-2}complex(0,)(0, \infty)ANOVA, variance comparison

💡 Parameterization Warning

Different libraries use different parameterizations. For the Exponential distribution: scipy.stats.expon uses scale =1/λ= 1/\lambda, while many textbooks use rate λ\lambda. For the Gamma distribution: scipy.stats.gamma uses shape a=αa = \alpha and scale =1/β= 1/\beta. Always check your library's documentation.


Cross-References

📋Related Topics

  • Descriptive Statistics → Understanding measures of center and spread before choosing distributions
  • Bayesian Inference → Conjugate priors (Beta-Binomial, Gamma-Poisson) simplify posterior computation
  • Hypothesis Testing → t-tests, F-tests, and chi-square tests all rely on specific distributions
  • Regression Analysis → Normal assumption for errors, F-test for model significance
  • Maximum Likelihood Estimation → Fitting distribution parameters from data
  • Monte Carlo Methods → Sampling from distributions to approximate complex integrals
  • Information Theory → KL divergence between distributions measures model mismatch
  • Gaussian Processes → Multivariate Normal extension for function-space modeling
Lesson Progress38 / 100