R Probability Distributions — Random Number Generation

R Data ScienceProbability DistributionsFree Lesson

Advertisement

R Probability Distributions — Random Number Generation

Learning Objectives

By the end of this tutorial, you will be able to:

  • Generate random numbers from common distributions
  • Calculate density, cumulative probability, and quantiles
  • Use the normal, binomial, Poisson, and exponential distributions
  • Apply the random variate generation functions
  • Understand the four functions for each distribution

The Four Functions

Every distribution in R has four functions:

FunctionPurposeExample
d*()Density/mass functiondnorm()
p*()Cumulative probability (CDF)pnorm()
q*()Quantile function (inverse CDF)qnorm()
r*()Random number generationrnorm()

Normal Distribution

# Random numbers
rnorm(10)                    # 10 standard normal values
rnorm(10, mean = 100, sd = 15)  # Custom mean and SD

# Density
dnorm(0)                     # [1] 0.3989423 (at x=0)
dnorm(seq(-3, 3, 0.1))      # Density at multiple points

# Cumulative probability
pnorm(0)                     # [1] 0.5 (50% below 0)
pnorm(1.96)                  # [1] 0.975 (97.5% below 1.96)
pnorm(1.96) - pnorm(-1.96)  # [1] 0.95 (95% within ±1.96)

# Quantile (inverse CDF)
qnorm(0.5)                   # [1] 0 (median)
qnorm(0.975)                 # [1] 1.96

# Plot
x <- seq(-4, 4, 0.1)
plot(x, dnorm(x), type = "l", main = "Normal Distribution")

Binomial Distribution

# Random numbers
rbinom(10, size = 10, prob = 0.5)  # 10 trials, 50% success

# Density (probability mass)
dbinom(5, size = 10, prob = 0.5)  # P(X=5)

# Cumulative probability
pbinom(5, size = 10, prob = 0.5)  # P(X ≤ 5)

# Quantile
qbinom(0.5, size = 10, prob = 0.5)

# Plot
x <- 0:10
barplot(dbinom(x, 10, 0.5), names.arg = x,
        main = "Binomial Distribution (n=10, p=0.5)")

Poisson Distribution

# Random numbers
rpois(10, lambda = 5)  # Average rate of 5

# Density
dpois(3, lambda = 5)   # P(X=3)

# Cumulative probability
ppois(3, lambda = 5)   # P(X ≤ 3)

# Quantile
qpois(0.5, lambda = 5)

# Plot
x <- 0:15
barplot(dpois(x, 5), names.arg = x,
        main = "Poisson Distribution (λ=5)")

Exponential Distribution

# Random numbers
rexp(10, rate = 0.5)  # Mean = 1/rate = 2

# Density
dexp(2, rate = 0.5)

# Cumulative probability
pexp(2, rate = 0.5)

# Quantile
qexp(0.5, rate = 0.5)

# Plot
x <- seq(0, 10, 0.1)
plot(x, dexp(x, 0.5), type = "l", main = "Exponential Distribution")

Other Distributions

DistributionRandomDensityCDFQuantile
Uniformrunif()dunif()punif()qunif()
Gammargamma()dgamma()pgamma()qgamma()
Betarbeta()dbeta()pbeta()qbeta()
Chi-squaredrchisq()dchisq()pchisq()qchisq()
trt()dt()pt()qt()
Frf()df()pf()qf()
Log-normalrlnorm()dlnorm()plnorm()qlnorm()
Weibullrweibull()dweibull()pweibull()qweibull()
# Uniform
runif(10, min = 0, max = 100)

# Gamma
rgamma(10, shape = 2, rate = 1)

# Beta
rbeta(10, shape1 = 2, shape2 = 5)

# Chi-squared
rchisq(10, df = 5)

# t-distribution
rt(10, df = 5)

# Log-normal
rlnorm(10, meanlog = 0, sdlog = 1)

Random Number Generation

# Set seed for reproducibility
set.seed(42)
rnorm(5)
# [1]  1.370958 -0.564698  0.363128  0.632863  0.404268

# Same seed = same numbers
set.seed(42)
rnorm(5)
# [1]  1.370958 -0.564698  0.363128  0.632863  0.404268

# Sampling without replacement
sample(1:10, 5)
sample(1:10, 5, replace = FALSE)

# Sampling with replacement
sample(1:10, 20, replace = TRUE)

# Weighted sampling
sample(c("a", "b", "c"), 10, replace = TRUE, prob = c(0.5, 0.3, 0.2))

# Sample from distribution
sample(1:100, 10, replace = FALSE)

Central Limit Theorem

# Demonstrate CLT
set.seed(42)
n_samples <- 1000
sample_size <- 30

means <- replicate(n_samples, {
  x <- rexp(sample_size, rate = 1)
  mean(x)
})

hist(means, breaks = 30, probability = TRUE,
     main = "CLT: Sample Means of Exponential Distribution")
curve(dnorm(x, mean = mean(means), sd = sd(means)),
      add = TRUE, col = "red", lwd = 2)

Practical Examples

Example 1: Simulation

# Coin flip simulation
set.seed(123)
n_flips <- 1000
prob_heads <- 0.5

flips <- rbinom(1, size = n_flips, prob = prob_heads)
cat("Heads:", flips, "Tails:", n_flips - flips, "\n")

# Multiple simulations
simulations <- rbinom(1000, size = n_flips, prob = prob_heads)
hist(simulations, breaks = 30, main = "Distribution of Heads in 1000 Flips")

Example 2: Confidence Intervals

# 95% CI for normal mean
x <- rnorm(100, mean = 50, sd = 10)
n <- length(x)
x_bar <- mean(x)
se <- sd(x) / sqrt(n)

# Using t-distribution
ci_lower <- qt(0.025, df = n - 1, lower.tail = FALSE) * se
ci_upper <- qt(0.975, df = n - 1, lower.tail = FALSE) * se

cat("95% CI:", x_bar - ci_lower, "to", x_bar + ci_upper, "\n")

Practice Exercises

Exercise 1: Dice Roll Simulation

Simulate rolling two dice 10000 times and plot the distribution of sums.

Solution

set.seed(42)
n <- 10000

die1 <- sample(1:6, n, replace = TRUE)
die2 <- sample(1:6, n, replace = TRUE)
sums <- die1 + die2

table(sums)
barplot(table(sums), main = "Sum of Two Dice (10000 rolls)")

Key Takeaways

  • Every distribution has 4 functions: d*, p*, q*, r*
  • r*() generates random numbers — set seed for reproducibility
  • dnorm(), dbinom(), dpois() calculate density/mass
  • pnorm(), pbinom(), ppois() calculate cumulative probability
  • qnorm(), qbinom(), qpois() calculate quantiles
  • CLT — sample means approach normality regardless of distribution

Next: Learn about R Hypothesis Testing — statistical significance.

Advertisement

Need Expert R Programming Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement