Probability Distributions | ChatWhole Learn

Why Probability Distributions Matter

💡 Why It Matters

Every real-world phenomenon has an underlying probability distribution. Machine learning models assume specific distributions — linear regression assumes normally distributed errors, Naive Bayes assumes feature independence within distributions, and Bayesian methods use conjugate priors. Choosing the wrong distribution leads to invalid inferences and poor predictions. Mastering distributions is not academic — it determines whether your model works or fails.

A probability distribution is a mathematical function that describes the likelihood of all possible outcomes of a random variable. Distributions come in two families:

Discrete: Countable outcomes (Bernoulli, Binomial, Poisson, Geometric)
Continuous: Uncountable outcomes over an interval (Normal, Exponential, Gamma, Beta, Chi-Square, t, F)

Each distribution is defined by its probability density function (PDF) or probability mass function (PMF), its cumulative distribution function (CDF), and a set of parameters that shape its behavior.

Normal Distribution

Normal (Gaussian) Distribution PDF

f(x) = \frac{1}{\sigma\sqrt{2\pi}} \exp\left(-\frac{(x - \mu)^2}{2\sigma^2}\right)

Here,

$\mu$ =Mean — controls the location of the center
$\sigma^2$ =Variance — controls the spread
$\sigma$ =Standard deviation
$x$ =Value at which density is evaluated

The Normal distribution is the cornerstone of statistics. Its importance stems from the Central Limit Theorem: the sum of many independent random variables tends toward a Normal distribution, regardless of the original distribution.

ℹ️ The Empirical Rule (68-95-99.7 Rule)

For a Normal distribution:

68.27% of data falls within $\mu \pm 1\sigma$
95.45% of data falls within $\mu \pm 2\sigma$
99.73% of data falls within $\mu \pm 3\sigma$

This rule provides quick estimates without consulting tables.

ThStandard Normal Distribution

Any Normal random variable $X \sim N(\mu, \sigma^2)$ can be transformed to the standard Normal $Z \sim N(0, 1)$ via:

Z = \frac{X - \mu}{\sigma}

The CDF of the standard Normal is denoted $\Phi(z)$ , and $P(a \leq X \leq b) = \Phi\left(\frac{b - \mu}{\sigma}\right) - \Phi\left(\frac{a - \mu}{\sigma}\right)$ .

DfKey Properties of the Normal Distribution

Exponential Distribution

Exponential Distribution PDF

f(x) = \lambda e^{-\lambda x}, \quad x \geq 0

Here,

$\lambda$ =Rate parameter — events per unit time
$1/\lambda$ =Mean waiting time
$x$ =Time or distance until the event

The Exponential distribution models the waiting time between consecutive events in a Poisson process. It is the continuous analogue of the Geometric distribution.

ℹ️ Memoryless Property

The Exponential distribution is the only continuous distribution with the memoryless property:

P(X > s + t \mid X > s) = P(X > t)

This means the probability of waiting an additional $t$ time units does not depend on how long you have already waited. This property makes it ideal for modeling random arrivals.

DfKey Properties

Gamma Distribution

Gamma Distribution PDF

f(x) = \frac{\beta^\alpha}{\Gamma(\alpha)} x^{\alpha - 1} e^{-\beta x}, \quad x > 0

Here,

$\alpha$ =Shape parameter — controls the form of the distribution
$\beta$ =Rate parameter (inverse scale)
$\Gamma(\alpha)$ =Gamma function: (α-1)! for integer α
$x$ =Positive continuous value

The Gamma distribution generalizes the Exponential distribution. While the Exponential models the waiting time for one event, the Gamma models the waiting time for the α-th event in a Poisson process.

💡 Relationship to Other Distributions

When $\alpha = 1$ : Gamma reduces to the Exponential distribution
When $\alpha = n/2$ and $\beta = 1/2$ : Gamma becomes the Chi-Square distribution with $n$ degrees of freedom
The Gamma distribution is the conjugate prior for the Poisson likelihood in Bayesian inference

DfKey Properties

Beta Distribution

Beta Distribution PDF

f(x) = \frac{x^{\alpha - 1}(1 - x)^{\beta - 1}}{B(\alpha, \beta)}, \quad 0 \leq x \leq 1

Here,

$\alpha$ =Shape parameter 1 — pulls the distribution toward 1
$\beta$ =Shape parameter 2 — pulls the distribution toward 0
$B(\alpha, \beta)$ =Beta function: B(α,β) = Γ(α)Γ(β)/Γ(α+β)
$x$ =Value in [0, 1]

The Beta distribution is defined on the interval $[0, 1]$ , making it the natural choice for modeling probabilities and proportions. It is the conjugate prior for the Bernoulli, Binomial, and Negative Binomial likelihoods.

ℹ️ Interpreting α and β as Prior Counts

Think of $\alpha - 1$ as the number of observed successes and $\beta - 1$ as the number of observed failures before seeing any data. For example:

$\text{Beta}(1, 1)$ = Uniform (no prior information)
$\text{Beta}(5, 3)$ = centered around 5/8 ≈ 0.625 (moderate confidence)
$\text{Beta}(50, 30)$ = tightly concentrated around 0.625 (high confidence)

DfKey Properties

Chi-Square Distribution

Chi-Square Distribution PDF

f(x) = \frac{x^{k/2 - 1} e^{-x/2}}{2^{k/2} \Gamma(k/2)}, \quad x > 0

Here,

$k$ =Degrees of freedom — a positive integer
$\Gamma(k/2)$ =Gamma function evaluated at k/2

The Chi-Square distribution is a special case of the Gamma distribution with shape $\alpha = k/2$ and rate $\beta = 1/2$ . It arises naturally as the distribution of sums of squared standard Normal random variables.

ThOrigin of Chi-Square

If $Z_1, Z_2, \ldots, Z_k$ are independent standard Normal random variables, then:

X = Z_1^2 + Z_2^2 + \cdots + Z_k^2 \sim \chi^2(k)

This result is fundamental to hypothesis testing, including the chi-square test for goodness of fit and tests of independence.

DfKey Properties

t-Distribution (Student's t)

t-Distribution PDF

f(x) = \frac{\Gamma\left(\frac{\nu+1}{2}\right)}{\sqrt{\nu\pi}\,\Gamma\left(\frac{\nu}{2}\right)} \left(1 + \frac{x^2}{\nu}\right)^{-(\nu+1)/2}

Here,

$\nu$ =Degrees of freedom — controls tail heaviness
$x$ =Value on the real line
$\Gamma$ =Gamma function

The t-distribution arises when estimating the mean of a normally distributed population with an unknown variance estimated from a small sample. It has heavier tails than the Normal, reflecting the additional uncertainty from estimating $\sigma$ .

ℹ️ Convergence to Normal

As the degrees of freedom $\nu \to \infty$ , the t-distribution converges to the standard Normal $N(0, 1)$ . For $\nu \geq 30$ , the t-distribution is very close to Normal. For small $\nu$ , the tails are substantially heavier, meaning extreme values are more likely.

ThDefinition via Ratio

If $Z \sim N(0,1)$ and $V \sim \chi^2(\nu)$ are independent, then:

T = \frac{Z}{\sqrt{V/\nu}} \sim t(\nu)

This ratio form explains why the t-distribution has heavier tails — the denominator introduces additional variability.

DfKey Properties

F-Distribution

F-Distribution PDF

f(x) = \frac{\sqrt{\frac{(d_1 x)^{d_1} d_2^{d_2}}{(d_1 x + d_2)^{d_1+d_2}}}}{x \, B\left(\frac{d_1}{2}, \frac{d_2}{2}\right)}, \quad x > 0

Here,

$d_1$ =Numerator degrees of freedom
$d_2$ =Denominator degrees of freedom
$B$ =Beta function

The F-distribution is the ratio of two independent Chi-Square random variables, each divided by its degrees of freedom. It is the foundation of ANOVA (Analysis of Variance) and F-tests for comparing model variances.

ThOrigin of F-Distribution

If $U \sim \chi^2(d_1)$ and $V \sim \chi^2(d_2)$ are independent, then:

F = \frac{U/d_1}{V/d_2} \sim F(d_1, d_2)

The connection to the t-distribution: if $T \sim t(\nu)$ , then $T^2 \sim F(1, \nu)$ .

DfKey Properties

Distribution Relationships

📋How Distributions Connect

Relationship	Formula	Context
Normal → Standard Normal	$Z = (X - \mu)/\sigma$	Standardization
Exponential → Gamma	Gamma(1, λ) = Exp(λ)	Single vs. multiple waiting times
Gamma → Chi-Square	Gamma(k/2, 1/2) = χ²(k)	Sum of squared Normals
Chi-Square → t	$T = Z / \sqrt{\chi^2/k}$	Small-sample inference
t → Normal	$t(\nu) \to N(0,1)$ as $\nu \to \infty$	Large-sample approximation
t² → F	$t^2(\nu) = F(1, \nu)$	Equivalence of tests
F → Beta	Via transformation of F	Distributional identity
Binomial → Normal	$np > 5$ , $n(1-p) > 5$	CLT approximation
Poisson → Normal	$\lambda > 20$	CLT approximation
Bernoulli → Beta	Beta is conjugate prior	Bayesian updating

💡 Hierarchy of Distributions

The Chi-Square is a special case of the Gamma. The t-distribution is built from Normal and Chi-Square. The F-distribution is built from two Chi-Squares. Understanding this hierarchy helps you remember formulas and choose the right distribution for your problem.

Python Implementation

📝Using scipy.stats

All major distributions are available in scipy.stats. Each distribution object supports methods for PDF/PMF, CDF, PPF (quantile function), random sampling, and moment calculation.

import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

# --- Normal Distribution ---
mu, sigma = 0, 1
normal = stats.norm(mu, sigma)

print(f"Normal PDF at 0: {normal.pdf(0):.4f}")           # 0.3989
print(f"Normal CDF at 1.96: {normal.cdf(1.96):.4f}")     # 0.9750
print(f"Normal PPF at 0.975: {normal.ppf(0.975):.4f}")   # 1.9600
print(f"Normal mean: {normal.mean():.4f}")                # 0.0000
print(f"Normal variance: {normal.var():.4f}")             # 1.0000
samples_normal = normal.rvs(size=10000, random_state=42)

# --- Exponential Distribution ---
lam = 2.0
exponential = stats.expon(scale=1/lam)

print(f"Exp PDF at 0.5: {exponential.pdf(0.5):.4f}")     # 0.7358
print(f"Exp mean: {exponential.mean():.4f}")              # 0.5000
print(f"Exp variance: {exponential.var():.4f}")           # 0.2500
samples_exp = exponential.rvs(size=10000, random_state=42)

# --- Gamma Distribution ---
alpha, beta_param = 3.0, 2.0
gamma = stats.gamma(a=alpha, scale=1/beta_param)

print(f"Gamma mean: {gamma.mean():.4f}")                  # 1.5000
print(f"Gamma variance: {gamma.var():.4f}")               # 0.7500
samples_gamma = gamma.rvs(size=10000, random_state=42)

# --- Beta Distribution ---
alpha_b, beta_b = 5.0, 3.0
beta = stats.beta(alpha_b, beta_b)

print(f"Beta mean: {beta.mean():.4f}")                    # 0.6250
print(f"Beta variance: {beta.var():.4f}")                 # 0.0268
samples_beta = beta.rvs(size=10000, random_state=42)

# --- Chi-Square Distribution ---
k = 5
chi2 = stats.chi2(df=k)

print(f"Chi2 mean: {chi2.mean():.4f}")                    # 5.0000
print(f"Chi2 variance: {chi2.var():.4f}")                 # 10.0000
samples_chi2 = chi2.rvs(size=10000, random_state=42)

# --- t-Distribution ---
nu = 5
t_dist = stats.t(df=nu)

print(f"t mean: {t_dist.mean():.4f}")                     # 0.0000
print(f"t variance: {t_dist.var():.4f}")                  # 1.6667
print(f"t 97.5th percentile: {t_dist.ppf(0.975):.4f}")   # 2.5706
samples_t = t_dist.rvs(size=10000, random_state=42)

# --- F-Distribution ---
d1, d2 = 5, 10
f_dist = stats.f(dfn=d1, dfd=d2)

print(f"F mean: {f_dist.mean():.4f}")                     # 1.2500
print(f"F variance: {f_dist.var():.4f}")                  # 0.9375
samples_f = f_dist.rvs(size=10000, random_state=42)

# --- Visualization ---
fig, axes = plt.subplots(2, 4, figsize=(16, 8))
dists = [
    (samples_normal, 'Normal(0,1)', -4, 4),
    (samples_exp, 'Exponential(2)', 0, 3),
    (samples_gamma, 'Gamma(3,2)', 0, 5),
    (samples_beta, 'Beta(5,3)', 0, 1),
    (samples_chi2, 'Chi-Square(5)', 0, 15),
    (samples_t, 't(5)', -5, 5),
    (samples_f, 'F(5,10)', 0, 5),
]

for ax, (data, title, lo, hi) in zip(axes.flat, dists):
    ax.hist(data, bins=50, density=True, alpha=0.7, edgecolor='black')
    ax.set_title(title)
    ax.set_xlim(lo, hi)

axes.flat[-1].axis('off')
plt.tight_layout()
plt.savefig('distributions.png', dpi=150)
plt.show()

# --- Hypothesis Testing Examples ---
# t-test: comparing two sample means
group_a = np.random.normal(loc=100, scale=15, size=50)
group_b = np.random.normal(loc=105, scale=15, size=50)
t_stat, p_value = stats.ttest_ind(group_a, group_b)
print(f"t-statistic: {t_stat:.4f}, p-value: {p_value:.4f}")

# F-test: comparing two variances
f_stat = np.var(group_a, ddof=1) / np.var(group_b, ddof=1)
f_p_value = 1 - stats.f.cdf(f_stat, len(group_a)-1, len(group_b)-1)
print(f"F-statistic: {f_stat:.4f}, p-value: {f_p_value:.4f}")

# Chi-square test: goodness of fit
observed = np.array([30, 50, 20])
expected = np.array([25, 50, 25])
chi2_stat, chi2_p = stats.chisquare(observed, f_exp=expected)
print(f"Chi2-statistic: {chi2_stat:.4f}, p-value: {chi2_p:.4f}")

Applications in AI and Machine Learning

Gaussian Processes

ℹ️ Gaussian Processes

A Gaussian Process (GP) defines a distribution over functions. Any finite set of function values follows a multivariate Normal distribution. GPs are used for regression, classification, and hyperparameter optimization (Bayesian optimization) because they provide uncertainty estimates alongside predictions.

The GP prior is $f(\mathbf{x}) \sim \mathcal{GP}(m(\mathbf{x}), k(\mathbf{x}, \mathbf{x}'))$ , where $k$ is a kernel function. Predictions at new points are obtained by conditioning on observed data — all operations remain in the Normal distribution family.

Reparameterization Trick

ThReparameterization Trick (VAEs)

In Variational Autoencoders, we need to backpropagate through a stochastic sampling step. The reparameterization trick writes:

z = \mu + \sigma \odot \epsilon, \quad \epsilon \sim N(0, I)

This converts implicit sampling into an explicit, differentiable transformation, enabling gradient-based optimization of the ELBO (Evidence Lower Bound). Without this trick, the non-differentiable sampling operation blocks gradient flow.

Other Applications

📋Distribution Applications in ML

Application	Distribution Used	Why
Linear regression errors	Normal	CLT justification, analytical tractability
Bayesian linear regression	Normal prior + Normal likelihood	Conjugate pair → closed-form posterior
Naive Bayes (continuous)	Normal per feature	Simple, fast, surprisingly effective
Multinomial Naive Bayes	Multinomial / Dirichlet	Text classification (bag of words)
Variational inference	Normal (VAE latent space)	Reparameterization trick enables backprop
Gaussian Mixture Models	Normal components	Density estimation, clustering
Thompson Sampling	Beta posterior	Bandit problems, A/B testing
Survival analysis	Exponential / Weibull	Time-to-event modeling
Reinforcement learning	Categorical / Dirichlet	Policy distributions, posterior over rewards
Generative models	Normal (diffusion processes)	Score matching, denoising

Common Mistakes

💡 Learn from Others' Errors

The following table captures frequent mistakes practitioners make when working with probability distributions. Avoiding these errors will save you debugging time and prevent incorrect conclusions.

Mistake	Why It Is Wrong	Correct Approach
Assuming all data is Normal	Real data is often skewed or heavy-tailed	Check with Q-Q plots, Shapiro-Wilk test
Using Normal for probabilities bounded in [0,1]	Normal assigns probability outside [0,1]	Use Beta distribution
Confusing rate and scale parameters	$f(x) = \lambda e^{-\lambda x}$ vs $f(x) = \frac{1}{\beta}e^{-x/\beta}$	Always check your library's parameterization
Ignoring degrees of freedom in t-tests	Small samples need t-distribution, not Normal	Use t for $n < 30$
Using z-test when $\sigma$ is unknown	Z-test requires known population variance	Use t-test instead
Assuming Chi-Square approximation for small expected counts	Chi-square test requires expected counts ≥ 5	Use Fisher's exact test for small samples
Forgetting that F-test is sensitive to non-normality	F-test assumes normally distributed data	Check assumptions or use robust alternatives
Using mean and variance for skewed distributions	Mean is misleading for skewed data	Use median and IQR
Not distinguishing PMF from PDF	Discrete distributions use PMF, continuous use PDF	Check whether your variable is discrete or continuous
Over-interpreting p-values	p < 0.05 does not mean the effect is large or important	Report effect sizes and confidence intervals

Interview Questions

📝Interview Question 1

Q: Explain the difference between the Normal and t-distribution. When would you use each?

A: The t-distribution has heavier tails than the Normal, reflecting additional uncertainty from estimating the population variance with a sample variance. Use the Normal when the population variance is known or the sample size is large ( $n \geq 30$ ). Use the t-distribution when the population variance is unknown and estimated from a small sample. As degrees of freedom increase, the t-distribution converges to the Normal.

📝Interview Question 2

Q: Why is the Beta distribution useful in Bayesian statistics?

A: The Beta distribution is the conjugate prior for the Binomial (and Bernoulli) likelihood. This means if the prior is $\text{Beta}(\alpha, \beta)$ and the likelihood is Binomial, the posterior is also a Beta distribution: $\text{Beta}(\alpha + s, \beta + n - s)$ where $s$ is the number of successes. This makes Bayesian updating analytically tractable — no numerical integration needed. The parameters $\alpha$ and $\beta$ can be interpreted as prior pseudo-counts of successes and failures.

📝Interview Question 3

Q: What is the Central Limit Theorem and why does it matter for the Normal distribution?

A: The CLT states that the sample mean of $n$ independent, identically distributed random variables with finite mean $\mu$ and variance $\sigma^2$ converges in distribution to $N(\mu, \sigma^2/n)$ as $n \to \infty$ , regardless of the original distribution. This is why the Normal distribution appears everywhere — sums and averages of many small effects tend toward Normality. It justifies using Normal-based confidence intervals and hypothesis tests even when the underlying data is not Normal, provided the sample size is large enough.

📝Interview Question 4

Q: You are building a model to predict click-through rates. Which distribution should you use for the target variable and why?

A: Click-through rates are proportions bounded in $[0, 1]$ . The Beta distribution is the natural choice because it is defined on $[0, 1]$ and can model various shapes (uniform, skewed, concentrated). Alternatively, if modeling individual binary clicks, use Bernoulli for each impression, or Binomial for the count of clicks out of $n$ impressions. For a Bayesian approach, the Beta-Binomial model provides a principled framework with uncertainty quantification.

📝Interview Question 5

Q: Explain the reparameterization trick in Variational Autoencoders.

A: In a VAE, the latent variable $z$ is sampled from an approximate posterior $q(z|x) = N(\mu(x), \sigma^2(x)I)$ . Direct sampling is non-differentiable, blocking gradient flow. The reparameterization trick rewrites $z = \mu(x) + \sigma(x) \odot \epsilon$ where $\epsilon \sim N(0, I)$ is an external noise source. Now $z$ is a deterministic, differentiable function of $\mu$ and $\sigma$ , and gradients can flow through to the encoder. The noise enters from outside the computational graph.

📝Interview Question 6

Q: What happens if you use a z-test instead of a t-test for a small sample with unknown variance?

A: The z-test assumes the population variance $\sigma^2$ is known. When it is unknown and estimated from a small sample ( $n < 30$ ), the sample variance underestimates the true variability, making the z-test overconfident. This leads to inflated Type I error rates — you reject the null hypothesis more often than you should. The t-distribution accounts for this extra uncertainty by having heavier tails, producing wider confidence intervals and more conservative p-values. Always use the t-test when $\sigma$ is unknown.

Practice Problems

📝Problem 1: Exponential Waiting Times

At a help desk, calls arrive according to a Poisson process with rate $\lambda = 4$ calls per hour. What is the probability that the time between two consecutive calls exceeds 30 minutes?

💡Solution

The waiting time between calls follows an Exponential distribution with $\lambda = 4$ per hour. We need $P(X > 0.5)$ where $X \sim \text{Exp}(4)$ .

P(X > 0.5) = e^{-4 \times 0.5} = e^{-2} \approx 0.1353

There is approximately a 13.5% chance that the gap between calls exceeds 30 minutes.

📝Problem 2: Beta Posterior Updating

You observe 8 successes and 2 failures in 10 Bernoulli trials. Starting with a $\text{Beta}(2, 2)$ prior, what is the posterior distribution? What is the posterior mean?

💡Solution

Since Beta is conjugate to Bernoulli, the posterior is:

\text{Beta}(2 + 8, 2 + 2) = \text{Beta}(10, 4)

The posterior mean is:

\frac{\alpha}{\alpha + \beta} = \frac{10}{10 + 4} = \frac{10}{14} \approx 0.7143

The prior was uniform-like (Beta(2,2) has mean 0.5). After observing 80% successes, the posterior mean shifts to 0.7143 — a weighted compromise between the prior belief and the observed data.

📝Problem 3: Chi-Square Test for Fairness

A die is rolled 120 times with the following results: {1: 18, 2: 25, 3: 15, 4: 23, 5: 22, 6: 17}. Test whether the die is fair at the 5% significance level.

💡Solution

If the die is fair, each face should appear with probability 1/6, giving expected count $120/6 = 20$ for each face.

\chi^2 = \sum_{i=1}^{6} \frac{(O_i - E_i)^2}{E_i} = \frac{(18-20)^2}{20} + \frac{(25-20)^2}{20} + \frac{(15-20)^2}{20} + \frac{(23-20)^2}{20} + \frac{(22-20)^2}{20} + \frac{(17-20)^2}{20}

= \frac{4 + 25 + 25 + 9 + 4 + 9}{20} = \frac{76}{20} = 3.8

Degrees of freedom: $6 - 1 = 5$ . The critical value at $\alpha = 0.05$ is $\chi^2_{0.05, 5} = 11.07$ .

Since $3.8 < 11.07$ , we fail to reject the null hypothesis. The die appears to be fair.

📝Problem 4: Gamma as Sum of Exponentials

If $X_1, X_2, X_3$ are independent with $X_i \sim \text{Exp}(2)$ , what is the distribution of $Y = X_1 + X_2 + X_3$ ? Find $E[Y]$ and $\text{Var}(Y)$ .

💡Solution

The sum of $n$ independent Exponential( $\lambda$ ) random variables follows a Gamma distribution:

Y = X_1 + X_2 + X_3 \sim \text{Gamma}(3, 2)

where $\alpha = 3$ (shape = number of exponentials) and $\beta = 2$ (rate).

E[Y] = \frac{\alpha}{\beta} = \frac{3}{2} = 1.5

\text{Var}(Y) = \frac{\alpha}{\beta^2} = \frac{3}{4} = 0.75

This result is used in reliability engineering: if three identical components each have exponentially distributed lifetimes, the total system lifetime follows a Gamma distribution.

📝Problem 5: t-Distribution Confidence Interval

A sample of 16 observations from a Normal population yields $\bar{x} = 52$ and $s = 8$ . Construct a 95% confidence interval for the population mean $\mu$ .

💡Solution

Since $\sigma$ is unknown and $n = 16 < 30$ , we use the t-distribution with $df = 15$ degrees of freedom.

The critical value is $t_{0.025, 15} = 2.131$ (from t-table).

The confidence interval is:

\bar{x} \pm t_{\alpha/2, df} \cdot \frac{s}{\sqrt{n}} = 52 \pm 2.131 \cdot \frac{8}{\sqrt{16}} = 52 \pm 2.131 \cdot 2 = 52 \pm 4.262

\text{95% CI: } (47.738, 56.262)

If we had incorrectly used the z-distribution ( $z_{0.025} = 1.96$ ), the interval would be $52 \pm 3.92 = (48.08, 55.92)$ — narrower and overconfident.

Quick Reference

📋Distribution Quick Reference Table

Distribution	PDF/PMF	Mean	Variance	Support	Key Use Case
Normal	$\frac{1}{\sigma\sqrt{2\pi}}e^{-(x-\mu)^2/2\sigma^2}$	$\mu$	$\sigma^2$	$(-\infty, \infty)$	Continuous measurements, errors
Exponential	$\lambda e^{-\lambda x}$	$1/\lambda$	$1/\lambda^2$	$[0, \infty)$	Waiting times, time between events
Gamma	$\frac{\beta^\alpha}{\Gamma(\alpha)}x^{\alpha-1}e^{-\beta x}$	$\alpha/\beta$	$\alpha/\beta^2$	$(0, \infty)$	Sum of waiting times, rainfall
Beta	$\frac{x^{\alpha-1}(1-x)^{\beta-1}}{B(\alpha,\beta)}$	$\frac{\alpha}{\alpha+\beta}$	$\frac{\alpha\beta}{(\alpha+\beta)^2(\alpha+\beta+1)}$	$[0, 1]$	Probabilities, proportions
Chi-Square	$\frac{x^{k/2-1}e^{-x/2}}{2^{k/2}\Gamma(k/2)}$	$k$	$2k$	$(0, \infty)$	Goodness of fit, variance tests
t	$\propto (1+x^2/\nu)^{-(\nu+1)/2}$	$0$ ( $\nu>1$ )	$\nu/(\nu-2)$ ( $\nu>2$ )	$(-\infty, \infty)$	Small-sample mean inference
F	$\propto \frac{(d_1 x)^{d_1} d_2^{d_2}}{(d_1 x+d_2)^{d_1+d_2}}$	$\frac{d_2}{d_2-2}$	complex	$(0, \infty)$	ANOVA, variance comparison

💡 Parameterization Warning

Different libraries use different parameterizations. For the Exponential distribution: scipy.stats.expon uses scale $= 1/\lambda$ , while many textbooks use rate $\lambda$ . For the Gamma distribution: scipy.stats.gamma uses shape $a = \alpha$ and scale $= 1/\beta$ . Always check your library's documentation.

Cross-References

📋Related Topics

Descriptive Statistics → Understanding measures of center and spread before choosing distributions
Bayesian Inference → Conjugate priors (Beta-Binomial, Gamma-Poisson) simplify posterior computation
Hypothesis Testing → t-tests, F-tests, and chi-square tests all rely on specific distributions
Regression Analysis → Normal assumption for errors, F-test for model significance
Maximum Likelihood Estimation → Fitting distribution parameters from data
Monte Carlo Methods → Sampling from distributions to approximate complex integrals
Information Theory → KL divergence between distributions measures model mismatch
Gaussian Processes → Multivariate Normal extension for function-space modeling