← Math|37 of 100
Probability

Random Variables & Probability Distributions

Master discrete and continuous random variables, PMFs, PDFs, CDFs, and key probability distributions with Python examples and AI/ML applications.

📂 Random Variables📖 Lesson 37 of 100🎓 Free Course

Advertisement

Random Variables & Probability Distributions

â„šī¸ Why It Matters

Every machine learning model makes predictions under uncertainty. A patient might have a disease or not. A stock price tomorrow could be higher or lower. A spam filter must decide if an email is junk. Random variables are the mathematical language that lets us quantify, model, and reason about this uncertainty. Without them, there is no probability theory — and without probability theory, there is no modern AI, statistics, or data science. Every loss function, every sampling strategy, every Bayesian model, and every generative AI system is built on the foundation of random variables and their distributions.


What is a Random Variable?

DfRandom Variable

A random variable XX is a function that maps outcomes from a sample space Ω\Omega to real numbers. Formally, X:Ω→RX: \Omega \rightarrow \mathbb{R}. It assigns a numerical value to each possible outcome of a random experiment.

Random Variable

X:Ω→RX: \Omega \rightarrow \mathbb{R}

Here,

  • XX=The random variable
  • Ί\Omega=The sample space (all possible outcomes)
  • R\mathbb{R}=The set of real numbers

Simple Analogy: Think of a random variable like a score in a game. The game has many possible outcomes (roll of a dice, draw of a card), but the random variable converts each outcome into a number you can work with — like points, dollars, or centimeters.

Real-World Examples:

  • Coin flip: X=1X = 1 if heads, X=0X = 0 if tails
  • Dice roll: XX = the number shown on the die (1 through 6)
  • Height of a person: XX = height in centimeters (can be any value in a range)
  • Number of customers: XX = count of customers arriving in an hour (0, 1, 2, ...)

Discrete vs. Continuous

DfDiscrete Random Variable

A random variable XX is discrete if it takes on a countable number of distinct values. The set of possible values is finite or countably infinite.

Examples of discrete variables:

  • Number of heads in 10 coin flips: {0,1,2,â€Ļ,10}\{0, 1, 2, \ldots, 10\}
  • Number of emails received per day: {0,1,2,3,â€Ļ}\{0, 1, 2, 3, \ldots\}
  • Customer rating: {1,2,3,4,5}\{1, 2, 3, 4, 5\}

DfContinuous Random Variable

A random variable XX is continuous if it can take on any value within a real interval. The set of possible values is uncountably infinite.

Examples of continuous variables:

  • Temperature: any value in [−40,50][-40, 50] degrees Celsius
  • Weight: any value in [0,∞)[0, \infty) kilograms
  • Time to complete a task: any value in [0,∞)[0, \infty) seconds

💡 Key Distinction

For a discrete random variable, P(X=x)>0P(X = x) > 0 for specific values. For a continuous random variable, P(X=x)=0P(X = x) = 0 for any single point — probabilities are only meaningful over intervals, e.g., P(a≤X≤b)P(a \leq X \leq b).


Probability Mass Function (PMF)

DfProbability Mass Function (PMF)

For a discrete random variable XX, the probability mass function p(x)p(x) gives the probability that XX takes on the exact value xx:

p(x)=P(X=x)p(x) = P(X = x)

PMF Properties

p(x)â‰Ĩ0and∑xp(x)=1p(x) \geq 0 \quad \text{and} \quad \sum_{x} p(x) = 1

Here,

  • p(x)p(x)=Probability that X equals x
  • ∑xp(x)=1\sum_x p(x) = 1=All probabilities sum to 1

📝Example: PMF of a Fair Die

For a fair six-sided die, XX = the outcome:

xx123456
p(x)p(x)1/61/61/61/61/61/6
  • p(1)=P(X=1)=16p(1) = P(X = 1) = \frac{1}{6}
  • ∑x=16p(x)=16×6=1\sum_{x=1}^{6} p(x) = \frac{1}{6} \times 6 = 1 ✓

📝Example: PMF of an Unfair Coin

Suppose a biased coin has P(heads)=0.7P(\text{heads}) = 0.7 and P(tails)=0.3P(\text{tails}) = 0.3:

xx0 (tails)1 (heads)
p(x)p(x)0.30.7
  • p(0)+p(1)=0.3+0.7=1p(0) + p(1) = 0.3 + 0.7 = 1 ✓

Probability Density Function (PDF)

DfProbability Density Function (PDF)

For a continuous random variable XX, the probability density function f(x)f(x) describes the relative likelihood of XX taking on a value near xx. The probability that XX falls in an interval [a,b][a, b] is the area under f(x)f(x) over that interval:

P(a≤X≤b)=âˆĢabf(x) dxP(a \leq X \leq b) = \int_a^b f(x)\,dx

PDF Properties

f(x)â‰Ĩ0andâˆĢ−∞∞f(x) dx=1f(x) \geq 0 \quad \text{and} \quad \int_{-\infty}^{\infty} f(x)\,dx = 1

Here,

  • f(x)f(x)=The probability density at x
  • âˆĢ−∞∞f(x)dx=1\int_{-\infty}^{\infty} f(x)dx = 1=Total area under the curve equals 1

âš ī¸ Important

Unlike a PMF, f(x)f(x) is not a probability. It is a density. The value f(x)f(x) can exceed 1 — only probabilities (areas under the curve) must be between 0 and 1. For a continuous variable, P(X=c)=0P(X = c) = 0 for any specific point cc.

📝Example: PDF of a Uniform Distribution

For XâˆŧUniform(0,1)X \sim \text{Uniform}(0, 1):

f(x)={10≤x≤10otherwisef(x) = \begin{cases} 1 & 0 \leq x \leq 1 \\ 0 & \text{otherwise} \end{cases}
  • P(0.2≤X≤0.5)=âˆĢ0.20.51 dx=0.3P(0.2 \leq X \leq 0.5) = \int_{0.2}^{0.5} 1\,dx = 0.3
  • The total area: âˆĢ011 dx=1\int_0^1 1\,dx = 1 ✓

Cumulative Distribution Function (CDF)

DfCumulative Distribution Function (CDF)

The cumulative distribution function F(x)F(x) gives the probability that the random variable XX takes on a value less than or equal to xx:

F(x)=P(X≤x)F(x) = P(X \leq x)

CDF Definition

F(x)=P(X≤x)F(x) = P(X \leq x)

Here,

  • F(x)F(x)=Cumulative probability up to x
  • P(X≤x)P(X \leq x)=Probability that X is at most x

CDF for Continuous Variables

F(x)=âˆĢ−∞xf(t) dtF(x) = \int_{-\infty}^{x} f(t)\,dt

Here,

  • F(x)F(x)=The CDF
  • f(t)f(t)=The probability density function

CDF for Discrete Variables

F(x)=∑xi≤xp(xi)F(x) = \sum_{x_i \leq x} p(x_i)

Here,

  • F(x)F(x)=The CDF
  • p(xi)p(x_i)=PMF evaluated at x_i

ThCDF Properties

  1. Non-decreasing: If a≤ba \leq b, then F(a)≤F(b)F(a) \leq F(b)
  2. Limits: lim⁡x→−∞F(x)=0\lim_{x \to -\infty} F(x) = 0 and lim⁡x→∞F(x)=1\lim_{x \to \infty} F(x) = 1
  3. Right-continuous: F(x)F(x) is always right-continuous
  4. Interval probability: P(a<X≤b)=F(b)−F(a)P(a < X \leq b) = F(b) - F(a)

📝Example: CDF of a Fair Die

For XX = outcome of a fair die roll:

xx123456
F(x)F(x)1/62/63/64/65/61
  • F(3)=P(X≤3)=P(X=1)+P(X=2)+P(X=3)=16+16+16=12F(3) = P(X \leq 3) = P(X=1) + P(X=2) + P(X=3) = \frac{1}{6} + \frac{1}{6} + \frac{1}{6} = \frac{1}{2}
  • P(2<X≤5)=F(5)−F(2)=56−26=36=12P(2 < X \leq 5) = F(5) - F(2) = \frac{5}{6} - \frac{2}{6} = \frac{3}{6} = \frac{1}{2}

💡 CDF vs PMF/PDF

The CDF works for both discrete and continuous random variables. It always exists and is always well-defined, making it a universal tool. The PMF only works for discrete variables; the PDF only works for continuous variables.


Bernoulli Distribution

The simplest distribution: a single yes/no trial.

DfBernoulli Distribution

A random variable XX follows a Bernoulli distribution with parameter pp if it takes value 1 with probability pp and value 0 with probability 1−p1 - p.

Bernoulli PMF

p(x)=px(1−p)1−x,x∈{0,1}p(x) = p^x (1-p)^{1-x}, \quad x \in \{0, 1\}

Here,

  • pp=Probability of success (X = 1)
  • 1−p1 - p=Probability of failure (X = 0)
  • xx=Outcome: 0 or 1

Bernoulli Parameters

Îŧ=p,΃2=p(1−p)\mu = p, \quad \sigma^2 = p(1-p)

Here,

  • Îŧ\mu=Mean (expected value)
  • ΃2\sigma^2=Variance

📝Example: Bernoulli Distribution

A drug has a 90% cure rate. Let X=1X = 1 if cured, X=0X = 0 if not.

  • p=0.9p = 0.9, so P(X=1)=0.9P(X = 1) = 0.9, P(X=0)=0.1P(X = 0) = 0.1
  • Mean: Îŧ=0.9\mu = 0.9
  • Variance: ΃2=0.9×0.1=0.09\sigma^2 = 0.9 \times 0.1 = 0.09
  • Standard deviation: ΃=0.3\sigma = 0.3

â„šī¸ AI/ML Connection

The Bernoulli distribution models binary decisions: spam/not spam, click/no-click, cat/dog. It is the output distribution of binary classifiers and the foundation of logistic regression.


Binomial Distribution

How many successes in nn independent Bernoulli trials?

DfBinomial Distribution

A random variable XX follows a binomial distribution XâˆŧBinomial(n,p)X \sim \text{Binomial}(n, p) if it counts the number of successes in nn independent Bernoulli trials, each with success probability pp.

Binomial PMF

P(X=k)=(nk)pk(1−p)n−k,k=0,1,2,â€Ļ,nP(X = k) = \binom{n}{k} p^k (1-p)^{n-k}, \quad k = 0, 1, 2, \ldots, n

Here,

  • (nk)\binom{n}{k}=Binomial coefficient: n choose k
  • nn=Number of trials
  • pp=Probability of success on each trial
  • kk=Number of successes

Binomial Parameters

Îŧ=np,΃2=np(1−p)\mu = np, \quad \sigma^2 = np(1-p)

Here,

  • nn=Number of trials
  • pp=Success probability
  • Îŧ\mu=Mean
  • ΃2\sigma^2=Variance

📝Example: Binomial Distribution

Flip a fair coin 10 times. Let XX = number of heads. Then XâˆŧBinomial(10,0.5)X \sim \text{Binomial}(10, 0.5).

P(X=5)=(105)(0.5)5(0.5)5=252×11024≈0.246P(X = 5) = \binom{10}{5} (0.5)^5 (0.5)^5 = 252 \times \frac{1}{1024} \approx 0.246
  • Mean: Îŧ=10×0.5=5\mu = 10 \times 0.5 = 5
  • Variance: ΃2=10×0.5×0.5=2.5\sigma^2 = 10 \times 0.5 \times 0.5 = 2.5

The most likely outcome is 5 heads (24.6% probability), which makes intuitive sense.

💡 Relationship to Bernoulli

A Bernoulli distribution is a special case of the binomial distribution where n=1n = 1. That is, Bernoulli(p)=Binomial(1,p)\text{Bernoulli}(p) = \text{Binomial}(1, p).


Poisson Distribution

Counting rare events over a fixed interval.

DfPoisson Distribution

A random variable XX follows a Poisson distribution XâˆŧPoisson(Îģ)X \sim \text{Poisson}(\lambda) if it counts the number of events occurring in a fixed interval of time or space, given a constant average rate Îģ\lambda and independent events.

Poisson PMF

P(X=k)=Îģke−Îģk!,k=0,1,2,â€ĻP(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}, \quad k = 0, 1, 2, \ldots

Here,

  • Îģ\lambda=Average rate (expected number of events)
  • kk=Number of events
  • ee=Euler's number ≈ 2.71828

Poisson Parameters

Îŧ=Îģ,΃2=Îģ\mu = \lambda, \quad \sigma^2 = \lambda

Here,

  • Îģ\lambda=Rate parameter
  • Îŧ\mu=Mean equals lambda
  • ΃2\sigma^2=Variance also equals lambda

📝Example: Poisson Distribution

A server receives an average of 4 requests per minute (Îģ=4\lambda = 4). What is the probability of getting exactly 6 requests in a minute?

P(X=6)=46e−46!=4096×0.0183720≈0.104P(X = 6) = \frac{4^6 e^{-4}}{6!} = \frac{4096 \times 0.0183}{720} \approx 0.104

There is about a 10.4% chance of receiving exactly 6 requests.

â„šī¸ Poisson Limit

When nn is large and pp is small, the Binomial distribution Binomial(n,p)\text{Binomial}(n, p) is well approximated by Poisson(Îģ)\text{Poisson}(\lambda) where Îģ=np\lambda = np. This is why Poisson is used for rare events.


Uniform Distribution

Every outcome equally likely over an interval.

DfUniform Distribution

A random variable XX follows a continuous uniform distribution XâˆŧUniform(a,b)X \sim \text{Uniform}(a, b) if every value in the interval [a,b][a, b] is equally likely.

Uniform PDF

f(x)=1b−a,a≤x≤bf(x) = \frac{1}{b - a}, \quad a \leq x \leq b

Here,

  • aa=Lower bound
  • bb=Upper bound
  • b−ab - a=Width of the interval

Uniform CDF

F(x)=x−ab−a,a≤x≤bF(x) = \frac{x - a}{b - a}, \quad a \leq x \leq b

Here,

  • F(x)F(x)=Cumulative probability at x

Uniform Parameters

Îŧ=a+b2,΃2=(b−a)212\mu = \frac{a + b}{2}, \quad \sigma^2 = \frac{(b - a)^2}{12}

Here,

  • Îŧ\mu=Mean (midpoint of the interval)
  • ΃2\sigma^2=Variance

📝Example: Uniform Distribution

A random number generator produces values uniformly between 0 and 1: XâˆŧUniform(0,1)X \sim \text{Uniform}(0, 1).

  • PDF: f(x)=1f(x) = 1 for 0≤x≤10 \leq x \leq 1
  • Mean: Îŧ=0+12=0.5\mu = \frac{0 + 1}{2} = 0.5
  • Variance: ΃2=(1−0)212=112≈0.0833\sigma^2 = \frac{(1-0)^2}{12} = \frac{1}{12} \approx 0.0833
  • P(0.25≤X≤0.75)=0.75−0.251−0=0.5P(0.25 \leq X \leq 0.75) = \frac{0.75 - 0.25}{1 - 0} = 0.5

â„šī¸ Why Uniform Matters

The Uniform(0,1) distribution is the foundation of all random number generation. Every continuous random variable can be generated from uniform random numbers using the inverse transform method.


Python Implementation

import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

# --- Bernoulli Distribution ---
# Simulate 10000 coin flips with p=0.7
bernoulli_rv = stats.bernoulli(p=0.7)
samples = bernoulli_rv.rvs(size=10000)
print(f"Bernoulli Mean: {samples.mean():.3f}")       # ~0.70
print(f"Bernoulli Var: {samples.var():.3f}")          # ~0.21
print(f"P(X=1): {bernoulli_rv.pmf(1):.3f}")          # 0.700
print(f"P(X<=0): {bernoulli_rv.cdf(0):.3f}")         # 0.300

# --- Binomial Distribution ---
# 10 trials, p=0.3, simulate 10000 experiments
binom_rv = stats.binom(n=10, p=0.3)
samples = binom_rv.rvs(size=10000)
print(f"Binomial Mean: {samples.mean():.3f}")         # ~3.0
print(f"Binomial Var: {samples.var():.3f}")           # ~2.1
print(f"P(X=5): {binom_rv.pmf(5):.4f}")              # ~0.1029
print(f"P(X<=3): {binom_rv.cdf(3):.4f}")             # ~0.6496

# --- Poisson Distribution ---
# Average 4 events per interval
poisson_rv = stats.poisson(mu=4)
samples = poisson_rv.rvs(size=10000)
print(f"Poisson Mean: {samples.mean():.3f}")          # ~4.0
print(f"Poisson Var: {samples.var():.3f}")            # ~4.0
print(f"P(X=6): {poisson_rv.pmf(6):.4f}")            # ~0.1042

# --- Uniform Distribution ---
# Continuous uniform on [0, 1]
uniform_rv = stats.uniform(loc=0, scale=1)
samples = uniform_rv.rvs(size=10000)
print(f"Uniform Mean: {samples.mean():.3f}")          # ~0.50
print(f"Uniform Var: {samples.var():.4f}")            # ~0.0833
print(f"P(0.25<=X<=0.75): {uniform_rv.cdf(0.75) - uniform_rv.cdf(0.25):.3f}")  # 0.500

# --- Visualization ---
fig, axes = plt.subplots(2, 2, figsize=(12, 8))

# Bernoulli
axes[0, 0].bar([0, 1], [0.3, 0.7], color=['steelblue', 'coral'])
axes[0, 0].set_title('Bernoulli(p=0.7)')
axes[0, 0].set_xlabel('x')
axes[0, 0].set_ylabel('P(X=x)')

# Binomial
x_binom = np.arange(0, 11)
axes[0, 1].bar(x_binom, binom_rv.pmf(x_binom), color='steelblue')
axes[0, 1].set_title('Binomial(n=10, p=0.3)')
axes[0, 1].set_xlabel('k')
axes[0, 1].set_ylabel('P(X=k)')

# Poisson
x_poisson = np.arange(0, 12)
axes[1, 0].bar(x_poisson, poisson_rv.pmf(x_poisson), color='coral')
axes[1, 0].set_title('Poisson(Îģ=4)')
axes[1, 0].set_xlabel('k')
axes[1, 0].set_ylabel('P(X=k)')

# Uniform
x_uniform = np.linspace(-0.2, 1.2, 1000)
axes[1, 1].fill_between(x_uniform, uniform_rv.pdf(x_uniform), alpha=0.3, color='steelblue')
axes[1, 1].plot(x_uniform, uniform_rv.pdf(x_uniform), color='steelblue')
axes[1, 1].set_title('Uniform(0, 1)')
axes[1, 1].set_xlabel('x')
axes[1, 1].set_ylabel('f(x)')

plt.tight_layout()
plt.savefig('distributions.png', dpi=150)
plt.show()

Applications in AI/ML

â„šī¸ Why Distributions Matter in ML

Probability distributions are not just theory — they are the engine behind every machine learning system. Here are the most important applications.

Loss Functions Derived from Distributions

Many common loss functions in ML are negative log-likelihoods of probability distributions:

Loss FunctionDistributionUse Case
Binary Cross-EntropyBernoulliBinary classification
Categorical Cross-EntropyCategoricalMulti-class classification
MSE (Mean Squared Error)GaussianRegression
Poisson LossPoissonCount prediction

📝Example: Cross-Entropy Loss

For a binary classifier predicting y^=P(Y=1)\hat{y} = P(Y=1), the cross-entropy loss for a single example with true label y∈{0,1}y \in \{0, 1\} is:

L=−[ylog⁡(y^)+(1−y)log⁡(1−y^)]\mathcal{L} = -[y \log(\hat{y}) + (1-y) \log(1-\hat{y})]

This is the negative log-likelihood of a Bernoulli distribution with parameter y^\hat{y}.

Sampling and Data Augmentation

  • Monte Carlo methods: Draw samples from distributions to estimate integrals and expectations
  • Reparameterization trick: Used in VAEs (Variational Autoencoders) to backpropagate through random sampling
  • Data augmentation: Add noise sampled from known distributions to training data

Generative Models

  • Gaussian Mixture Models (GMM): Model data as a mixture of Gaussians
  • Naive Bayes: Assume features follow specific distributions (Gaussian, Bernoulli, Multinomial)
  • Normalizing Flows: Transform simple distributions (Uniform, Gaussian) into complex ones

💡 The Big Picture

Choosing the right distribution for your data is one of the most important modeling decisions. If your data are counts, use Poisson or Negative Binomial. If they are binary, use Bernoulli. If they are continuous and symmetric, consider Gaussian. Mis-specifying the distribution leads to poor models and misleading conclusions.


Common Mistakes

MistakeWhy It's WrongCorrect Approach
Saying P(X=0.5)=0.3P(X = 0.5) = 0.3 for a continuous variableFor continuous RVs, the probability at a single point is always 0Use intervals: P(0.4≤X≤0.6)P(0.4 \leq X \leq 0.6)
Treating PDF values as probabilitiesf(x)f(x) is a density, not a probability; it can exceed 1Probabilities are areas under the curve: âˆĢf(x)dx\int f(x)dx
Using PMF for continuous variablesPMFs are only defined for discrete variablesUse PDF for continuous, PMF for discrete
Forgetting ∑p(x)=1\sum p(x) = 1 or âˆĢf(x)dx=1\int f(x)dx = 1If these don't hold, it's not a valid distributionAlways verify normalization
Confusing Îŧ\mu and xˉ\bar{x}Îŧ\mu is the population mean (parameter); xˉ\bar{x} is the sample mean (statistic)Îŧ\mu is fixed; xˉ\bar{x} varies by sample
Assuming independence when it's not givenIndependence is a strong assumption that must be justifiedCheck the problem statement carefully
Using Binomial when trials are not independentBinomial requires independent trialsUse Hypergeometric for sampling without replacement

Interview Questions

📝Q1: What is the difference between a PMF and a PDF?

Answer: A PMF (Probability Mass Function) applies to discrete random variables and gives the probability at each point: p(x)=P(X=x)p(x) = P(X = x). A PDF (Probability Density Function) applies to continuous random variables and gives the density at each point. For continuous variables, P(X=c)=0P(X = c) = 0 for any single point — probabilities are only defined over intervals using the PDF: P(a≤X≤b)=âˆĢabf(x)dxP(a \leq X \leq b) = \int_a^b f(x)dx. The key difference is that PMF values are actual probabilities (between 0 and 1), while PDF values are densities (can exceed 1).

📝Q2: Why is the variance of a Poisson distribution equal to its mean?

Answer: This is a mathematical property of the Poisson distribution that arises from its derivation. The Poisson distribution models the limit of the Binomial distribution as n→∞n \to \infty and p→0p \to 0 with np=Îģnp = \lambda fixed. For Binomial, the variance is np(1−p)np(1-p). As p→0p \to 0, (1−p)→1(1-p) \to 1, so the variance approaches np=Îģnp = \lambda, which is also the mean. This equality (Îŧ=΃2=Îģ\mu = \sigma^2 = \lambda) is a defining characteristic of the Poisson distribution and is used as a diagnostic: if the sample mean and variance are approximately equal, the data may follow a Poisson distribution.

📝Q3: How would you choose between Bernoulli, Binomial, and Poisson distributions?

Answer:

  • Bernoulli: Single binary trial (yes/no, success/failure). Example: "Will this customer buy?"
  • Binomial: Fixed number nn of independent binary trials. Example: "How many of 10 customers will buy?"
  • Poisson: Count of events in a continuous interval (time/space) at a constant rate. Example: "How many customers arrive per hour?"

The key distinctions: Bernoulli is for one trial, Binomial is for a fixed number of trials, and Poisson is for events over a continuous interval. Use Binomial when you know nn; use Poisson when nn is effectively infinite and pp is small.

📝Q4: What is the CDF and why is it useful?

Answer: The CDF (Cumulative Distribution Function) is defined as F(x)=P(X≤x)F(x) = P(X \leq x). It gives the probability that a random variable takes a value less than or equal to xx. The CDF is useful because: (1) It works for both discrete and continuous random variables, unlike PMF/PDF which are type-specific. (2) It always exists and is well-defined. (3) You can compute interval probabilities: P(a<X≤b)=F(b)−F(a)P(a < X \leq b) = F(b) - F(a). (4) In hypothesis testing, p-values are computed using CDFs. (5) For continuous variables, f(x)=F′(x)f(x) = F'(x), connecting CDF to PDF.

📝Q5: A website gets 3 visits per minute on average. What is the probability of getting exactly 5 visits in the next minute? Model this and compute.

Answer: This is a Poisson process with Îģ=3\lambda = 3.

P(X=5)=35e−35!=243×0.0498120=12.1120≈0.1008P(X = 5) = \frac{3^5 e^{-3}}{5!} = \frac{243 \times 0.0498}{120} = \frac{12.1}{120} \approx 0.1008

There is approximately a 10.1% chance of getting exactly 5 visits in the next minute. This can be computed in Python using stats.poisson.pmf(5, mu=3).

📝Q6: Why can't we use a Gaussian distribution to model the number of heads in 10 coin flips?

Answer: The number of heads in 10 flips follows Binomial(10,0.5)\text{Binomial}(10, 0.5), which is a discrete distribution with support {0,1,2,â€Ļ,10}\{0, 1, 2, \ldots, 10\}. The Gaussian is a continuous distribution defined on (−∞,∞)(-\infty, \infty). Using a Gaussian would imply that 2.5 heads is possible, which is nonsensical. Additionally, the Gaussian allows negative values (impossible for counts) and has nonzero probability density at non-integer values. While the Gaussian can approximate the Binomial for large nn (by the Central Limit Theorem), it is not the correct model for small nn with discrete outcomes.


Practice Problems

📝Problem 1: Bernoulli Calculation

A quality control test has a 95% detection rate. If 20 items pass through the test, what is the probability that exactly 19 are correctly detected? Assume independence.

💡Solution

This is a Binomial distribution with n=20n = 20, p=0.95p = 0.95, and k=19k = 19:

P(X=19)=(2019)(0.95)19(0.05)1P(X = 19) = \binom{20}{19} (0.95)^{19} (0.05)^{1}
=20×(0.95)19×0.05= 20 \times (0.95)^{19} \times 0.05
(0.95)19≈0.3774(0.95)^{19} \approx 0.3774
P(X=19)=20×0.3774×0.05≈0.377P(X = 19) = 20 \times 0.3774 \times 0.05 \approx 0.377

There is approximately a 37.7% chance that exactly 19 out of 20 items are correctly detected.

📝Problem 2: Poisson Process

A call center receives an average of 8 calls per 10-minute interval. What is the probability of receiving exactly 3 calls in a 10-minute interval? What is the probability of receiving 0 calls?

💡Solution

With Îģ=8\lambda = 8:

Exactly 3 calls:

P(X=3)=83e−83!=512×0.0003356=0.17166≈0.0286P(X = 3) = \frac{8^3 e^{-8}}{3!} = \frac{512 \times 0.000335}{6} = \frac{0.1716}{6} \approx 0.0286

0 calls:

P(X=0)=80e−80!=e−8≈0.000335P(X = 0) = \frac{8^0 e^{-8}}{0!} = e^{-8} \approx 0.000335

There is a 2.86% chance of exactly 3 calls and a 0.034% chance of no calls. The call center is very unlikely to be idle.

📝Problem 3: Uniform Distribution

A bus arrives uniformly between 0 and 20 minutes. You arrive at a random time. What is the probability that you wait less than 5 minutes for the bus?

💡Solution

Let XX = time until the bus arrives, XâˆŧUniform(0,20)X \sim \text{Uniform}(0, 20). You arrive at a random time tt. You wait less than 5 minutes if the bus arrives in the interval [t,t+5][t, t+5], but constrained to [0,20][0, 20].

Since the bus arrival is uniform, the probability you wait less than 5 minutes is:

P(wait<5)=520=0.25=25%P(\text{wait} < 5) = \frac{5}{20} = 0.25 = 25\%

This is because the bus arrival time is uniform, and any 5-minute window out of the 20-minute interval has probability 520\frac{5}{20}.

📝Problem 4: Connecting Distributions

Show that as n→∞n \to \infty with p=Îģ/np = \lambda/n, the Binomial PMF approaches the Poisson PMF.

💡Solution

Starting with the Binomial PMF:

P(X=k)=(nk)pk(1−p)n−kP(X = k) = \binom{n}{k} p^k (1-p)^{n-k}

Substitute p=Îģ/np = \lambda/n:

P(X=k)=(nk)(Îģn)k(1−Îģn)n−kP(X = k) = \binom{n}{k} \left(\frac{\lambda}{n}\right)^k \left(1 - \frac{\lambda}{n}\right)^{n-k}

Expand:

=n!k!(n−k)!⋅Îģknk⋅(1−Îģn)n−k= \frac{n!}{k!(n-k)!} \cdot \frac{\lambda^k}{n^k} \cdot \left(1 - \frac{\lambda}{n}\right)^{n-k}

As n→∞n \to \infty:

  • n!(n−k)!⋅nk→1\frac{n!}{(n-k)! \cdot n^k} \to 1 (the leading terms cancel)
  • (1−Îģn)n→e−Îģ\left(1 - \frac{\lambda}{n}\right)^n \to e^{-\lambda} (fundamental limit)
  • (1−Îģn)−k→1\left(1 - \frac{\lambda}{n}\right)^{-k} \to 1

Therefore:

P(X=k)→Îģke−Îģk!P(X = k) \to \frac{\lambda^k e^{-\lambda}}{k!}

This is the Poisson PMF. This is the Poisson Limit Theorem and explains why Poisson is used for rare events with large nn and small pp.


Quick Reference

📋Key Takeaways

ConceptFormulaNotes
Random VariableX:Ω→RX: \Omega \rightarrow \mathbb{R}Maps outcomes to numbers
PMFp(x)=P(X=x)p(x) = P(X = x)Discrete only; ∑p(x)=1\sum p(x) = 1
PDFf(x)â‰Ĩ0f(x) \geq 0; âˆĢf(x)dx=1\int f(x)dx = 1Continuous only; f(x)f(x) is density, not probability
CDFF(x)=P(X≤x)F(x) = P(X \leq x)Works for both types; non-decreasing, F(−∞)=0F(-\infty)=0, F(∞)=1F(\infty)=1
Bernoullipx(1−p)1−xp^x(1-p)^{1-x}Single binary trial; Îŧ=p\mu = p, ΃2=p(1−p)\sigma^2 = p(1-p)
Binomial(nk)pk(1−p)n−k\binom{n}{k}p^k(1-p)^{n-k}nn independent trials; Îŧ=np\mu = np, ΃2=np(1−p)\sigma^2 = np(1-p)
PoissonÎģke−Îģk!\frac{\lambda^k e^{-\lambda}}{k!}Rare events; Îŧ=΃2=Îģ\mu = \sigma^2 = \lambda
Uniformf(x)=1b−af(x) = \frac{1}{b-a}Equal likelihood; Îŧ=a+b2\mu = \frac{a+b}{2}, ΃2=(b−a)212\sigma^2 = \frac{(b-a)^2}{12}

Cross-References

Lesson Progress37 / 100