R Probability Distributions — Random Number Generation
Learning Objectives
By the end of this tutorial, you will be able to:
- Generate random numbers from common distributions
- Calculate density, cumulative probability, and quantiles
- Use the normal, binomial, Poisson, and exponential distributions
- Apply the random variate generation functions
- Understand the four functions for each distribution
The Four Functions
Every distribution in R has four functions:
| Function | Purpose | Example |
|---|---|---|
d*() | Density/mass function | dnorm() |
p*() | Cumulative probability (CDF) | pnorm() |
q*() | Quantile function (inverse CDF) | qnorm() |
r*() | Random number generation | rnorm() |
Normal Distribution
# Random numbers
rnorm(10) # 10 standard normal values
rnorm(10, mean = 100, sd = 15) # Custom mean and SD
# Density
dnorm(0) # [1] 0.3989423 (at x=0)
dnorm(seq(-3, 3, 0.1)) # Density at multiple points
# Cumulative probability
pnorm(0) # [1] 0.5 (50% below 0)
pnorm(1.96) # [1] 0.975 (97.5% below 1.96)
pnorm(1.96) - pnorm(-1.96) # [1] 0.95 (95% within ±1.96)
# Quantile (inverse CDF)
qnorm(0.5) # [1] 0 (median)
qnorm(0.975) # [1] 1.96
# Plot
x <- seq(-4, 4, 0.1)
plot(x, dnorm(x), type = "l", main = "Normal Distribution")
Binomial Distribution
# Random numbers
rbinom(10, size = 10, prob = 0.5) # 10 trials, 50% success
# Density (probability mass)
dbinom(5, size = 10, prob = 0.5) # P(X=5)
# Cumulative probability
pbinom(5, size = 10, prob = 0.5) # P(X ≤ 5)
# Quantile
qbinom(0.5, size = 10, prob = 0.5)
# Plot
x <- 0:10
barplot(dbinom(x, 10, 0.5), names.arg = x,
main = "Binomial Distribution (n=10, p=0.5)")
Poisson Distribution
# Random numbers
rpois(10, lambda = 5) # Average rate of 5
# Density
dpois(3, lambda = 5) # P(X=3)
# Cumulative probability
ppois(3, lambda = 5) # P(X ≤ 3)
# Quantile
qpois(0.5, lambda = 5)
# Plot
x <- 0:15
barplot(dpois(x, 5), names.arg = x,
main = "Poisson Distribution (λ=5)")
Exponential Distribution
# Random numbers
rexp(10, rate = 0.5) # Mean = 1/rate = 2
# Density
dexp(2, rate = 0.5)
# Cumulative probability
pexp(2, rate = 0.5)
# Quantile
qexp(0.5, rate = 0.5)
# Plot
x <- seq(0, 10, 0.1)
plot(x, dexp(x, 0.5), type = "l", main = "Exponential Distribution")
Other Distributions
| Distribution | Random | Density | CDF | Quantile |
|---|---|---|---|---|
| Uniform | runif() | dunif() | punif() | qunif() |
| Gamma | rgamma() | dgamma() | pgamma() | qgamma() |
| Beta | rbeta() | dbeta() | pbeta() | qbeta() |
| Chi-squared | rchisq() | dchisq() | pchisq() | qchisq() |
| t | rt() | dt() | pt() | qt() |
| F | rf() | df() | pf() | qf() |
| Log-normal | rlnorm() | dlnorm() | plnorm() | qlnorm() |
| Weibull | rweibull() | dweibull() | pweibull() | qweibull() |
# Uniform
runif(10, min = 0, max = 100)
# Gamma
rgamma(10, shape = 2, rate = 1)
# Beta
rbeta(10, shape1 = 2, shape2 = 5)
# Chi-squared
rchisq(10, df = 5)
# t-distribution
rt(10, df = 5)
# Log-normal
rlnorm(10, meanlog = 0, sdlog = 1)
Random Number Generation
# Set seed for reproducibility
set.seed(42)
rnorm(5)
# [1] 1.370958 -0.564698 0.363128 0.632863 0.404268
# Same seed = same numbers
set.seed(42)
rnorm(5)
# [1] 1.370958 -0.564698 0.363128 0.632863 0.404268
# Sampling without replacement
sample(1:10, 5)
sample(1:10, 5, replace = FALSE)
# Sampling with replacement
sample(1:10, 20, replace = TRUE)
# Weighted sampling
sample(c("a", "b", "c"), 10, replace = TRUE, prob = c(0.5, 0.3, 0.2))
# Sample from distribution
sample(1:100, 10, replace = FALSE)
Central Limit Theorem
# Demonstrate CLT
set.seed(42)
n_samples <- 1000
sample_size <- 30
means <- replicate(n_samples, {
x <- rexp(sample_size, rate = 1)
mean(x)
})
hist(means, breaks = 30, probability = TRUE,
main = "CLT: Sample Means of Exponential Distribution")
curve(dnorm(x, mean = mean(means), sd = sd(means)),
add = TRUE, col = "red", lwd = 2)
Practical Examples
Example 1: Simulation
# Coin flip simulation
set.seed(123)
n_flips <- 1000
prob_heads <- 0.5
flips <- rbinom(1, size = n_flips, prob = prob_heads)
cat("Heads:", flips, "Tails:", n_flips - flips, "\n")
# Multiple simulations
simulations <- rbinom(1000, size = n_flips, prob = prob_heads)
hist(simulations, breaks = 30, main = "Distribution of Heads in 1000 Flips")
Example 2: Confidence Intervals
# 95% CI for normal mean
x <- rnorm(100, mean = 50, sd = 10)
n <- length(x)
x_bar <- mean(x)
se <- sd(x) / sqrt(n)
# Using t-distribution
ci_lower <- qt(0.025, df = n - 1, lower.tail = FALSE) * se
ci_upper <- qt(0.975, df = n - 1, lower.tail = FALSE) * se
cat("95% CI:", x_bar - ci_lower, "to", x_bar + ci_upper, "\n")
Practice Exercises
Exercise 1: Dice Roll Simulation
Simulate rolling two dice 10000 times and plot the distribution of sums.
Solution
set.seed(42)
n <- 10000
die1 <- sample(1:6, n, replace = TRUE)
die2 <- sample(1:6, n, replace = TRUE)
sums <- die1 + die2
table(sums)
barplot(table(sums), main = "Sum of Two Dice (10000 rolls)")
Key Takeaways
- Every distribution has 4 functions:
d*,p*,q*,r* r*()generates random numbers — set seed for reproducibilitydnorm(),dbinom(),dpois()calculate density/masspnorm(),pbinom(),ppois()calculate cumulative probabilityqnorm(),qbinom(),qpois()calculate quantiles- CLT — sample means approach normality regardless of distribution
Next: Learn about R Hypothesis Testing — statistical significance.