Probability: Bayes' Theorem, PDF/CDF, CLT

Why Probability Matters

Probability is the mathematics of uncertainty. In data science, we use probability to:

Quantify uncertainty in predictions
Make inferences about populations from samples
Build probabilistic models
Test hypotheses

Architecture Diagram

Certainty ←──────────────────────────→ Uncertainty
   │                                      │
   └──────── Probability Theory ──────────┘

Fundamental Concepts

Sample Space and Events

# Sample space: All possible outcomes
# Event: A subset of the sample space

# Example: Rolling two dice
sample_space = {(i, j) for i in range(1, 7) for j in range(1, 7)}
print(f"Sample space size: {len(sample_space)}")  # 36

# Event: Sum equals 7
event_sum_7 = {(i, j) for i, j in sample_space if i + j == 7}
print(f"Event sum=7: {event_sum_7}")
print(f"Probability: {len(event_sum_7)/len(sample_space):.4f}")  # 6/36 = 0.1667

Basic Probability Rules

Addition Rule

P(A \cup B) = P(A) + P(B) - P(A \cap B)

Here,

$P(A ∪ B)$ =Probability of A or B occurring
$P(A ∩ B)$ =Probability of both A and B occurring

Multiplication Rule

P(A \cap B) = P(A|B) \cdot P(B) = P(B|A) \cdot P(A)

Here,

$P(A|B)$ =Conditional probability of A given B

P(A|B) = \frac{P(A \cap B)}{P(B)}

# 1. Addition Rule: P(A or B) = P(A) + P(B) - P(A and B)
# 2. Multiplication Rule: P(A and B) = P(A|B) * P(B)
# 3. Complement Rule: P(not A) = 1 - P(A)
# 4. Conditional Probability: P(A|B) = P(A and B) / P(B)

# Example: Card drawing
# P(Red) = 26/52 = 0.5
# P(King) = 4/52 = 1/13
# P(Red ∩ King) = 2/52 = 1/26
# P(Red | King) = 2/4 = 0.5

# Verify addition rule
p_red = 26/52
p_king = 4/52
p_red_and_king = 2/52
p_red_or_king = p_red + p_king - p_red_and_king

print(f"P(Red or King): {p_red_or_king:.4f}")
print(f"Verification: {p_red_or_king == (26+4-2)/52}")

ℹ️ Independence and Conditional Probability

Two events A and B are independent if and only if P(A ∩ B) = P(A)·P(B), equivalently P(A|B) = P(A). Independence means knowing B occurred gives no information about A. Always check independence before applying simplified multiplication rules — assuming independence when it doesn't hold leads to serious errors.

Probability Distributions

Discrete Distributions

import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

# 1. Bernoulli Distribution (single trial)
p = 0.3  # Probability of success
bernoulli = stats.bernoulli(p)
print(f"Bernoulli(p={p}):")
print(f"  P(X=0): {bernoulli.pmf(0):.2f}")
print(f"  P(X=1): {bernoulli.pmf(1):.2f}")
print(f"  Mean: {bernoulli.mean():.2f}")
print(f"  Variance: {bernoulli.var():.2f}")

# 2. Binomial Distribution (n trials)
n = 10
p = 0.5
binomial = stats.binom(n, p)
x = np.arange(0, n+1)

plt.figure(figsize=(10, 5))
plt.bar(x, binomial.pmf(x), color='skyblue', edgecolor='black')
plt.title(f'Binomial Distribution (n={n}, p={p})')
plt.xlabel('Number of Successes')
plt.ylabel('Probability')
plt.show()

# 3. Poisson Distribution (events per interval)
lambda_param = 5
poisson = stats.poisson(lambda_param)
x = np.arange(0, 15)

plt.figure(figsize=(10, 5))
plt.bar(x, poisson.pmf(x), color='lightgreen', edgecolor='black')
plt.title(f'Poisson Distribution (λ={lambda_param})')
plt.xlabel('Number of Events')
plt.ylabel('Probability')
plt.show()

# 4. Geometric Distribution (trials until first success)
p = 0.3
geometric = stats.geom(p)
x = np.arange(1, 11)

plt.figure(figsize=(10, 5))
plt.bar(x, geometric.pmf(x), color='salmon', edgecolor='black')
plt.title(f'Geometric Distribution (p={p})')
plt.xlabel('Trial of First Success')
plt.ylabel('Probability')
plt.show()

Binomial Distribution PMF

P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}

Here,

$n$ =Number of trials
$k$ =Number of successes
$p$ =Probability of success on each trial

Poisson Distribution PMF

P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}

Here,

$λ$ =Average rate of events per interval
$k$ =Number of events

Continuous Distributions

# 1. Normal Distribution (Gaussian)
mu = 0
sigma = 1
normal = stats.norm(mu, sigma)

x = np.linspace(-4, 4, 1000)

fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# PDF
axes[0].plot(x, normal.pdf(x), 'b-', linewidth=2)
axes[0].fill_between(x, normal.pdf(x), alpha=0.3)
axes[0].set_title('Normal Distribution PDF')

# CDF
axes[1].plot(x, normal.cdf(x), 'r-', linewidth=2)
axes[1].set_title('Normal Distribution CDF')

plt.tight_layout()
plt.show()

# 2. Uniform Distribution
a, b = 0, 10
uniform = stats.uniform(a, b-a)

x = np.linspace(-1, 11, 1000)

plt.figure(figsize=(10, 5))
plt.plot(x, uniform.pdf(x), 'b-', linewidth=2)
plt.fill_between(x, uniform.pdf(x), alpha=0.3)
plt.title(f'Uniform Distribution [{a}, {b}]')
plt.show()

# 3. Exponential Distribution
lambda_param = 1
exponential = stats.expon(scale=1/lambda_param)

x = np.linspace(0, 5, 1000)

plt.figure(figsize=(10, 5))
plt.plot(x, exponential.pdf(x), 'r-', linewidth=2)
plt.fill_between(x, exponential.pdf(x), alpha=0.3)
plt.title(f'Exponential Distribution (λ={lambda_param})')
plt.show()

# 4. Beta Distribution
alpha, beta_param = 2, 5
beta = stats.beta(alpha, beta_param)

x = np.linspace(0, 1, 1000)

plt.figure(figsize=(10, 5))
plt.plot(x, beta.pdf(x), 'g-', linewidth=2)
plt.fill_between(x, beta.pdf(x), alpha=0.3)
plt.title(f'Beta Distribution (α={alpha}, β={beta_param})')
plt.show()

PDF vs CDF

# PDF (Probability Density Function)
# - Height represents relative likelihood
# - Area under curve = 1
# - P(X = x) = 0 for continuous distributions

# CDF (Cumulative Distribution Function)
# - P(X ≤ x)
# - Monotonically increasing
# - Ranges from 0 to 1

x = np.linspace(-4, 4, 1000)
pdf = stats.norm.pdf(x)
cdf = stats.norm.cdf(x)

fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# PDF with area
axes[0].plot(x, pdf, 'b-', linewidth=2)
axes[0].fill_between(x, pdf, where=(x >= -1) & (x <= 1), alpha=0.5, label='P(-1 ≤ X ≤ 1)')
axes[0].set_title('PDF: P(-1 ≤ X ≤ 1) ≈ 0.68')
axes[0].legend()

# CDF
axes[1].plot(x, cdf, 'r-', linewidth=2)
axes[1].axhline(y=0.84, color='g', linestyle='--', label='P(X ≤ 1)')
axes[1].axvline(x=1, color='g', linestyle='--')
axes[1].set_title('CDF: P(X ≤ 1)')
axes[1].legend()

plt.tight_layout()
plt.show()

# Calculating probabilities
# P(a ≤ X ≤ b) = CDF(b) - CDF(a)
p_between = stats.norm.cdf(1) - stats.norm.cdf(-1)
print(f"P(-1 ≤ X ≤ 1): {p_between:.4f}")

# P(X > x) = 1 - CDF(x)
p_greater = 1 - stats.norm.cdf(2)
print(f"P(X > 2): {p_greater:.4f}")

# P(X < x) = CDF(x)
p_less = stats.norm.cdf(-1)
print(f"P(X < -1): {p_less:.4f}")

💡 Computing Probabilities with CDFs

For any continuous distribution, compute interval probabilities using the CDF:

P(a ≤ X ≤ b) = F(b) - F(a)
P(X > a) = 1 - F(a)
P(X ≤ a) = F(a) This works because the CDF encodes all probabilistic information about the distribution.

Bayes' Theorem

P(A|B) = \frac{P(B|A) \times P(A)}{P(B)}

Intuitive Understanding

Architecture Diagram

Bayes' Theorem:

P(Hypothesis | Data) = P(Data | Hypothesis) × P(Hypothesis)
                       ─────────────────────────────────────
                                    P(Data)

Posterior = Likelihood × Prior
            ──────────────────
               Evidence

Bayes' Theorem (Expanded)

P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B|A) \cdot P(A) + P(B|\neg A) \cdot P(\neg A)}

Here,

$P(A|B)$ =Posterior: probability of hypothesis given evidence
$P(B|A)$ =Likelihood: probability of evidence given hypothesis
$P(A)$ =Prior: initial belief in hypothesis
$P(B|¬A)$ =False positive rate
$P(¬A)$ =Prior probability of complement

Example: Medical Testing

# Medical test example
# Disease prevalence: 1%
# Test sensitivity (true positive rate): 99%
# Test specificity (true negative rate): 95%

# Given: Person tests positive
# Find: Probability they actually have the disease

# Prior: P(Disease) = 0.01
# Likelihood: P(Positive | Disease) = 0.99
# Evidence: P(Positive) = ?

# Using law of total probability
p_disease = 0.01
p_positive_given_disease = 0.99
p_positive_given_healthy = 0.05  # False positive rate = 1 - specificity

# P(Positive) = P(Positive|Disease)×P(Disease) + P(Positive|Healthy)×P(Healthy)
p_positive = (p_positive_given_disease * p_disease + 
              p_positive_given_healthy * (1 - p_disease))

# Apply Bayes' theorem
p_disease_given_positive = (p_positive_given_disease * p_disease) / p_positive

print("Medical Test Analysis:")
print(f"P(Disease): {p_disease:.2%}")
print(f"P(Positive|Disease): {p_positive_given_disease:.2%}")
print(f"P(Positive|Healthy): {p_positive_given_healthy:.2%}")
print(f"P(Positive): {p_positive:.4f}")
print(f"\nP(Disease|Positive): {p_disease_given_positive:.4f}")
print(f"P(Disease|Positive): {p_disease_given_positive:.2%}")

# Visualization
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Tree diagram
categories = ['Disease\n(1%)', 'Healthy\n(99%)']
true_pos = 0.01 * 0.99
false_pos = 0.99 * 0.05
true_neg = 0.99 * 0.95
false_neg = 0.01 * 0.01

values = [true_pos, false_pos, true_neg, false_neg]
labels = ['True Pos', 'False Pos', 'True Neg', 'False Neg']
colors = ['#FF6B6B', '#FFEAA7', '#4ECDC4', '#45B7D1']

axes[0].pie(values, labels=labels, colors=colors, autopct='%1.2f%%')
axes[0].set_title('Test Results Breakdown')

# Bayesian update visualization
prior = 0.01
likelihood_ratio = 0.99 / 0.05
posterior = (likelihood_ratio * prior) / (likelihood_ratio * prior + (1 - prior))

axes[1].bar(['Prior', 'Posterior'], [prior, posterior], color=['#4ECDC4', '#FF6B6B'])
axes[1].set_title('Bayesian Update: Prior → Posterior')
axes[1].set_ylabel('Probability')
axes[1].set_ylim(0, 0.5)

plt.tight_layout()
plt.show()

📝Worked Example: Medical Test (Bayes' Theorem)

Given: Disease prevalence = 1%, Sensitivity = 99%, Specificity = 95%

Step 1: Identify the quantities

P(Disease) = 0.01 (prior)
P(Positive | Disease) = 0.99 (sensitivity/likelihood)
P(Positive | Healthy) = 1 - 0.95 = 0.05 (false positive rate)

Step 2: Compute evidence P(Positive) P(Positive) = 0.99 × 0.01 + 0.05 × 0.99 = 0.0099 + 0.0495 = 0.0594

Step 3: Apply Bayes' Theorem P(Disease | Positive) = 0.0099 / 0.0594 ≈ 16.67%

Interpretation: Despite the test being "99% accurate," only ~17% of positive results are true positives. This is the base rate fallacy — ignoring the low prevalence leads to dramatically overestimating the probability of disease.

Naive Bayes Classification

# Example: Spam classification
# Prior: P(Spam) = 0.3, P(Ham) = 0.7
# Likelihood: P(Word | Spam), P(Word | Ham)

# Vocabulary: ['free', 'money', 'meeting', 'project']
p_spam = 0.3
p_ham = 0.7

# Likelihoods
likelihoods = {
    'free': {'spam': 0.8, 'ham': 0.1},
    'money': {'spam': 0.7, 'ham': 0.05},
    'meeting': {'spam': 0.1, 'ham': 0.6},
    'project': {'spam': 0.2, 'ham': 0.5}
}

# Email contains: 'free', 'money', 'meeting'
words = ['free', 'money', 'meeting']

# Calculate likelihoods (assuming independence)
p_words_given_spam = np.prod([likelihoods[w]['spam'] for w in words])
p_words_given_ham = np.prod([likelihoods[w]['ham'] for w in words])

# Apply Bayes' theorem
p_words = p_words_given_spam * p_spam + p_words_given_ham * p_ham
p_spam_given_words = (p_words_given_spam * p_spam) / p_words
p_ham_given_words = (p_words_given_ham * p_ham) / p_words

print("Spam Classification:")
print(f"P(Spam): {p_spam}")
print(f"P(Ham): {p_ham}")
print(f"P(Words|Spam): {p_words_given_spam:.6f}")
print(f"P(Words|Ham): {p_words_given_ham:.6f}")
print(f"\nP(Spam|Words): {p_spam_given_words:.4f}")
print(f"P(Ham|Words): {p_ham_given_words:.4f}")
print(f"Classification: {'Spam' if p_spam_given_words > p_ham_given_words else 'Ham'}")

# Visualization
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Prior vs Posterior
axes[0].bar(['P(Spam)', 'P(Ham)'], [p_spam, p_ham], color=['#FF6B6B', '#4ECDC4'], 
            label='Prior', alpha=0.6)
axes[0].bar(['P(Spam)', 'P(Ham)'], [p_spam_given_words, p_ham_given_words], 
            color=['#FF6B6B', '#4ECDC4'], label='Posterior', alpha=0.9)
axes[0].set_title('Prior vs Posterior')
axes[0].legend()

# Likelihood comparison
word_labels = list(likelihoods.keys())
spam_likelihoods = [likelihoods[w]['spam'] for w in word_labels]
ham_likelihoods = [likelihoods[w]['ham'] for w in word_labels]

x = np.arange(len(word_labels))
width = 0.35
axes[1].bar(x - width/2, spam_likelihoods, width, label='Spam', color='#FF6B6B')
axes[1].bar(x + width/2, ham_likelihoods, width, label='Ham', color='#4ECDC4')
axes[1].set_xticks(x)
axes[1].set_xticklabels(word_labels)
axes[1].set_title('Word Likelihoods')
axes[1].legend()

plt.tight_layout()
plt.show()

ℹ️ The 'Naive' Assumption

Naive Bayes assumes features are conditionally independent given the class label: P(x₁, x₂, ..., xₙ | C) = ∏ᵢ P(xᵢ | C). This assumption is rarely true in practice, yet Naive Bayes performs remarkably well for text classification because (1) the ranking of posterior probabilities is often correct even if the absolute values are not, and (2) the independence assumption acts as regularization.

Central Limit Theorem (CLT)

The CLT states that the sampling distribution of the sample mean approaches a normal distribution as sample size increases, regardless of the population distribution.

\bar{X} \sim N\left(\mu, \frac{\sigma^2}{n}\right) \text{ as } n \to \infty

# Demonstrate CLT with different distributions
np.random.seed(42)

distributions = {
    'Uniform': np.random.uniform(0, 10, 10000),
    'Exponential': np.random.exponential(2, 10000),
    'Chi-Square': np.random.chisquare(2, 10000),
    'Bimodal': np.concatenate([np.random.normal(-2, 1, 5000), 
                                np.random.normal(2, 1, 5000)])
}

sample_sizes = [1, 5, 10, 30]

fig, axes = plt.subplots(len(distributions), len(sample_sizes), 
                         figsize=(16, 12))

for i, (dist_name, data) in enumerate(distributions.items()):
    for j, n in enumerate(sample_sizes):
        # Take multiple samples and calculate means
        sample_means = [np.mean(np.random.choice(data, n)) for _ in range(1000)]
        
        axes[i, j].hist(sample_means, bins=30, density=True, 
                       alpha=0.7, edgecolor='black')
        axes[i, j].set_title(f'{dist_name}\nn={n}')
        
        if j == 0:
            axes[i, j].set_ylabel('Density')
        if i == len(distributions) - 1:
            axes[i, j].set_xlabel('Sample Mean')

plt.suptitle('Central Limit Theorem Demonstration', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

# CLT in action: Confidence intervals
population = np.random.exponential(2, 100000)
pop_mean = np.mean(population)
pop_std = np.std(population)

print(f"Population Mean: {pop_mean:.4f}")
print(f"Population Std: {pop_std:.4f}")

# Different sample sizes
sample_sizes = [10, 30, 50, 100]
n_simulations = 1000

for n in sample_sizes:
    # Simulate sampling
    sample_means = [np.mean(np.random.choice(population, n)) for _ in range(n_simulations)]
    
    # 95% Confidence interval
    ci_lower = np.percentile(sample_means, 2.5)
    ci_upper = np.percentile(sample_means, 97.5)
    
    print(f"\nn={n}:")
    print(f"  Sample Mean: {np.mean(sample_means):.4f}")
    print(f"  Sample Std: {np.std(sample_means):.4f}")
    print(f"  Expected Std (σ/√n): {pop_std/np.sqrt(n):.4f}")
    print(f"  95% CI: [{ci_lower:.4f}, {ci_upper:.4f}]")
    print(f"  CI contains population mean: {ci_lower <= pop_mean <= ci_upper}")

💡 Rules of Thumb for CLT

n ≥ 30: CLT approximation is generally reasonable for most distributions
n ≥ 50: Good approximation even for moderately skewed distributions
n ≥ 100: Excellent approximation for nearly all distributions
For heavily skewed or heavy-tailed distributions, larger samples are needed
The CLT applies to means; analogous results exist for sums, proportions, and other statistics

Hypothesis Testing

# One-sample t-test
from scipy import stats

# Sample data
np.random.seed(42)
sample = np.random.normal(10.5, 2, 100)  # Sample from population with mean 10.5

# Null hypothesis: μ = 10
# Alternative hypothesis: μ ≠ 10
t_stat, p_value = stats.ttest_1samp(sample, 10)

print("One-Sample t-test:")
print(f"Sample Mean: {np.mean(sample):.4f}")
print(f"T-statistic: {t_stat:.4f}")
print(f"P-value: {p_value:.4f}")
print(f"Result: {'Reject' if p_value < 0.05 else 'Fail to reject'} H0")

# Two-sample t-test
sample1 = np.random.normal(10, 2, 100)
sample2 = np.random.normal(11, 2, 100)

t_stat, p_value = stats.ttest_ind(sample1, sample2)
print("\nTwo-Sample t-test:")
print(f"Sample 1 Mean: {np.mean(sample1):.4f}")
print(f"Sample 2 Mean: {np.mean(sample2):.4f}")
print(f"T-statistic: {t_stat:.4f}")
print(f"P-value: {p_value:.4f}")

# Paired t-test
before = np.random.normal(100, 15, 50)
after = before + np.random.normal(5, 10, 50)

t_stat, p_value = stats.ttest_rel(before, after)
print("\nPaired t-test:")
print(f"Mean Difference: {np.mean(after - before):.4f}")
print(f"T-statistic: {t_stat:.4f}")
print(f"P-value: {p_value:.4f}")

# Chi-square test
# Test independence between two categorical variables
observed = np.array([[50, 30, 20],
                     [35, 45, 20],
                     [15, 25, 60]])

chi2, p_value, dof, expected = stats.chi2_contingency(observed)
print("\nChi-Square Test:")
print(f"Chi-square statistic: {chi2:.4f}")
print(f"P-value: {p_value:.4f}")
print(f"Degrees of freedom: {dof}")

ℹ️ Common Misconceptions About P-values

A p-value is NOT the probability that H₀ is true
A p-value is NOT the probability the result occurred by chance
A p-value IS the probability of observing a test statistic as extreme or more extreme than the one observed, assuming H₀ is true
A non-significant result (p > 0.05) does NOT prove H₀ — it means insufficient evidence to reject

Practical Example: A/B Testing with Bayesian Analysis

import numpy as np
from scipy import stats

# A/B test data
np.random.seed(42)

# Control group (A)
n_control = 1000
conversions_control = 120
conversion_rate_control = conversions_control / n_control

# Treatment group (B)
n_treatment = 1000
conversions_treatment = 150
conversion_rate_treatment = conversions_treatment / n_treatment

print("A/B Test Results:")
print(f"Control: {conversions_control}/{n_control} = {conversion_rate_control:.2%}")
print(f"Treatment: {conversions_treatment}/{n_treatment} = {conversion_rate_treatment:.2%}")
print(f"Lift: {(conversion_rate_treatment/conversion_rate_control - 1)*100:.1f}%")

# Frequentist approach
from statsmodels.stats.proportion import proportions_ztest

count = np.array([conversions_control, conversions_treatment])
nobs = np.array([n_control, n_treatment])

z_stat, p_value = proportions_ztest(count, nobs)
print(f"\nFrequentist Analysis:")
print(f"Z-statistic: {z_stat:.4f}")
print(f"P-value: {p_value:.4f}")
print(f"Significant at α=0.05: {p_value < 0.05}")

# Bayesian approach
# Prior: Beta(1, 1) (uniform prior)
# Posterior: Beta(conversions + 1, non-conversions + 1)

prior_a = 1
prior_b = 1

posterior_control = stats.beta(
    conversions_control + prior_a,
    n_control - conversions_control + prior_b
)

posterior_treatment = stats.beta(
    conversions_treatment + prior_a,
    n_treatment - conversions_treatment + prior_b
)

# Sample from posteriors
n_samples = 10000
samples_control = posterior_control.rvs(n_samples)
samples_treatment = posterior_treatment.rvs(n_samples)

# Probability that B > A
prob_b_better = np.mean(samples_treatment > samples_control)
print(f"\nBayesian Analysis:")
print(f"P(B > A): {prob_b_better:.2%}")

# Expected lift
lift = (samples_treatment - samples_control) / samples_control
print(f"Expected Lift: {np.mean(lift)*100:.1f}% ± {np.std(lift)*100:.1f}%")

# Visualization
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Posterior distributions
x = np.linspace(0.08, 0.20, 1000)
axes[0].plot(x, posterior_control.pdf(x), 'b-', label='Control', linewidth=2)
axes[0].plot(x, posterior_treatment.pdf(x), 'r-', label='Treatment', linewidth=2)
axes[0].fill_between(x, posterior_control.pdf(x), alpha=0.3)
axes[0].fill_between(x, posterior_treatment.pdf(x), alpha=0.3)
axes[0].set_title('Posterior Distributions')
axes[0].set_xlabel('Conversion Rate')
axes[0].set_ylabel('Density')
axes[0].legend()

# Lift distribution
axes[1].hist(lift * 100, bins=50, density=True, alpha=0.7, edgecolor='black')
axes[1].axvline(x=0, color='r', linestyle='--', label='No Effect')
axes[1].axvline(x=np.mean(lift)*100, color='g', linestyle='-', 
                label=f'Mean: {np.mean(lift)*100:.1f}%')
axes[1].set_title('Lift Distribution (B vs A)')
axes[1].set_xlabel('Relative Improvement (%)')
axes[1].set_ylabel('Density')
axes[1].legend()

plt.tight_layout()
plt.show()

📝Worked Example: Bayesian A/B Test

Given: Control: 120/1000 conversions, Treatment: 150/1000 conversions

Step 1: Define priors — Beta(1,1) (uniform/non-informative)

Step 2: Compute posteriors

Posterior(Control) = Beta(120 + 1, 880 + 1) = Beta(121, 881)
Posterior(Treatment) = Beta(150 + 1, 850 + 1) = Beta(151, 851)

Step 3: Sample and compare

Draw 10,000 samples from each posterior
P(Treatment > Control) ≈ 98.2%
Expected lift: ~25% with 95% credible interval

Decision: Strong Bayesian evidence that Treatment outperforms Control. The Bayesian approach gives a direct probability of improvement, unlike frequentist p-values.

Key Takeaways

📋Summary: Probability, Bayes, PDF/CDF, CLT

Probability provides a rigorous axiomatic framework for quantifying uncertainty — P(A) ≥ 0, P(Ω) = 1, countable additivity
PDF describes density (area = probability); CDF gives cumulative probability directly — use CDFs for interval calculations
Bayes' Theorem enables principled belief updating: Posterior ∝ Likelihood × Prior. Watch for the base rate fallacy
CLT guarantees the sampling distribution of the mean is approximately normal for large n — this underpins confidence intervals and hypothesis tests regardless of population shape
Hypothesis testing provides a framework for making decisions from data — understand what p-values actually mean (and don't mean)
Discrete distributions (Binomial, Poisson) model counts; continuous distributions (Normal, Exponential, Beta) model measurements
Naive Bayes assumes conditional independence — rarely true but often effective for classification

Practice Exercise

Calculate the probability of getting exactly 3 heads in 5 fair coin flips using the Binomial distribution
Apply Bayes' Theorem to a drug testing scenario: 2% prevalence, 95% sensitivity, 90% specificity — what is P(Disease | Positive)?
Demonstrate CLT by sampling from an exponential distribution with n = 2, 10, 30, 100 and plotting the sampling distribution of the mean
Perform a one-sample t-test and a two-sample t-test on generated data
Build a Bayesian A/B testing framework that computes P(B > A) for different sample sizes
For a Poisson process with λ = 3 events/hour, compute the probability of observing 0, 1, 2, 3, 4, 5+ events in an hour
Compare the PDF and CDF of the normal distribution visually and verify that P(μ - σ ≤ X ≤ μ + σ) ≈ 0.6827

Probability: Bayes' Theorem, PDF/CDF, CLT

Why Probability Matters

Fundamental Concepts

Sample Space and Events

Basic Probability Rules

Addition Rule

Multiplication Rule

Probability Distributions

Discrete Distributions

Binomial Distribution PMF

Poisson Distribution PMF

Continuous Distributions

PDF vs CDF

Bayes' Theorem

Intuitive Understanding

Bayes' Theorem (Expanded)

Example: Medical Testing

📝Worked Example: Medical Test (Bayes' Theorem)

Naive Bayes Classification

Central Limit Theorem (CLT)

Hypothesis Testing

Practical Example: A/B Testing with Bayesian Analysis

📝Worked Example: Bayesian A/B Test

Key Takeaways

📋Summary: Probability, Bayes, PDF/CDF, CLT

Practice Exercise

Need Expert Data Science Help?