Probability: Bayes' Theorem, PDF/CDF, CLT

Module 1: FoundationsFree Lesson

Advertisement

Why Probability Matters

Probability is the mathematics of uncertainty. In data science, we use probability to:

  • Quantify uncertainty in predictions
  • Make inferences about populations from samples
  • Build probabilistic models
  • Test hypotheses
Architecture Diagram
Certainty โ†โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ†’ Uncertainty
   โ”‚                                      โ”‚
   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Probability Theory โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Fundamental Concepts

Sample Space and Events

# Sample space: All possible outcomes
# Event: A subset of the sample space

# Example: Rolling two dice
sample_space = {(i, j) for i in range(1, 7) for j in range(1, 7)}
print(f"Sample space size: {len(sample_space)}")  # 36

# Event: Sum equals 7
event_sum_7 = {(i, j) for i, j in sample_space if i + j == 7}
print(f"Event sum=7: {event_sum_7}")
print(f"Probability: {len(event_sum_7)/len(sample_space):.4f}")  # 6/36 = 0.1667

Basic Probability Rules

Addition Rule

P(AโˆชB)=P(A)+P(B)โˆ’P(AโˆฉB)P(A \cup B) = P(A) + P(B) - P(A \cap B)

Here,

  • P(AโˆชB)P(A โˆช B)=Probability of A or B occurring
  • P(AโˆฉB)P(A โˆฉ B)=Probability of both A and B occurring

Multiplication Rule

P(AโˆฉB)=P(AโˆฃB)โ‹…P(B)=P(BโˆฃA)โ‹…P(A)P(A \cap B) = P(A|B) \cdot P(B) = P(B|A) \cdot P(A)

Here,

  • P(AโˆฃB)P(A|B)=Conditional probability of A given B
P(AโˆฃB)=P(AโˆฉB)P(B)P(A|B) = \frac{P(A \cap B)}{P(B)}
# 1. Addition Rule: P(A or B) = P(A) + P(B) - P(A and B)
# 2. Multiplication Rule: P(A and B) = P(A|B) * P(B)
# 3. Complement Rule: P(not A) = 1 - P(A)
# 4. Conditional Probability: P(A|B) = P(A and B) / P(B)

# Example: Card drawing
# P(Red) = 26/52 = 0.5
# P(King) = 4/52 = 1/13
# P(Red โˆฉ King) = 2/52 = 1/26
# P(Red | King) = 2/4 = 0.5

# Verify addition rule
p_red = 26/52
p_king = 4/52
p_red_and_king = 2/52
p_red_or_king = p_red + p_king - p_red_and_king

print(f"P(Red or King): {p_red_or_king:.4f}")
print(f"Verification: {p_red_or_king == (26+4-2)/52}")

โ„น๏ธ Independence and Conditional Probability

Two events A and B are independent if and only if P(A โˆฉ B) = P(A)ยทP(B), equivalently P(A|B) = P(A). Independence means knowing B occurred gives no information about A. Always check independence before applying simplified multiplication rules โ€” assuming independence when it doesn't hold leads to serious errors.

Probability Distributions

Discrete Distributions

import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

# 1. Bernoulli Distribution (single trial)
p = 0.3  # Probability of success
bernoulli = stats.bernoulli(p)
print(f"Bernoulli(p={p}):")
print(f"  P(X=0): {bernoulli.pmf(0):.2f}")
print(f"  P(X=1): {bernoulli.pmf(1):.2f}")
print(f"  Mean: {bernoulli.mean():.2f}")
print(f"  Variance: {bernoulli.var():.2f}")

# 2. Binomial Distribution (n trials)
n = 10
p = 0.5
binomial = stats.binom(n, p)
x = np.arange(0, n+1)

plt.figure(figsize=(10, 5))
plt.bar(x, binomial.pmf(x), color='skyblue', edgecolor='black')
plt.title(f'Binomial Distribution (n={n}, p={p})')
plt.xlabel('Number of Successes')
plt.ylabel('Probability')
plt.show()

# 3. Poisson Distribution (events per interval)
lambda_param = 5
poisson = stats.poisson(lambda_param)
x = np.arange(0, 15)

plt.figure(figsize=(10, 5))
plt.bar(x, poisson.pmf(x), color='lightgreen', edgecolor='black')
plt.title(f'Poisson Distribution (ฮป={lambda_param})')
plt.xlabel('Number of Events')
plt.ylabel('Probability')
plt.show()

# 4. Geometric Distribution (trials until first success)
p = 0.3
geometric = stats.geom(p)
x = np.arange(1, 11)

plt.figure(figsize=(10, 5))
plt.bar(x, geometric.pmf(x), color='salmon', edgecolor='black')
plt.title(f'Geometric Distribution (p={p})')
plt.xlabel('Trial of First Success')
plt.ylabel('Probability')
plt.show()

Binomial Distribution PMF

P(X=k)=(nk)pk(1โˆ’p)nโˆ’kP(X = k) = \binom{n}{k} p^k (1-p)^{n-k}

Here,

  • nn=Number of trials
  • kk=Number of successes
  • pp=Probability of success on each trial

Poisson Distribution PMF

P(X=k)=ฮปkeโˆ’ฮปk!P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}

Here,

  • ฮปฮป=Average rate of events per interval
  • kk=Number of events

Continuous Distributions

# 1. Normal Distribution (Gaussian)
mu = 0
sigma = 1
normal = stats.norm(mu, sigma)

x = np.linspace(-4, 4, 1000)

fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# PDF
axes[0].plot(x, normal.pdf(x), 'b-', linewidth=2)
axes[0].fill_between(x, normal.pdf(x), alpha=0.3)
axes[0].set_title('Normal Distribution PDF')

# CDF
axes[1].plot(x, normal.cdf(x), 'r-', linewidth=2)
axes[1].set_title('Normal Distribution CDF')

plt.tight_layout()
plt.show()

# 2. Uniform Distribution
a, b = 0, 10
uniform = stats.uniform(a, b-a)

x = np.linspace(-1, 11, 1000)

plt.figure(figsize=(10, 5))
plt.plot(x, uniform.pdf(x), 'b-', linewidth=2)
plt.fill_between(x, uniform.pdf(x), alpha=0.3)
plt.title(f'Uniform Distribution [{a}, {b}]')
plt.show()

# 3. Exponential Distribution
lambda_param = 1
exponential = stats.expon(scale=1/lambda_param)

x = np.linspace(0, 5, 1000)

plt.figure(figsize=(10, 5))
plt.plot(x, exponential.pdf(x), 'r-', linewidth=2)
plt.fill_between(x, exponential.pdf(x), alpha=0.3)
plt.title(f'Exponential Distribution (ฮป={lambda_param})')
plt.show()

# 4. Beta Distribution
alpha, beta_param = 2, 5
beta = stats.beta(alpha, beta_param)

x = np.linspace(0, 1, 1000)

plt.figure(figsize=(10, 5))
plt.plot(x, beta.pdf(x), 'g-', linewidth=2)
plt.fill_between(x, beta.pdf(x), alpha=0.3)
plt.title(f'Beta Distribution (ฮฑ={alpha}, ฮฒ={beta_param})')
plt.show()

PDF vs CDF

# PDF (Probability Density Function)
# - Height represents relative likelihood
# - Area under curve = 1
# - P(X = x) = 0 for continuous distributions

# CDF (Cumulative Distribution Function)
# - P(X โ‰ค x)
# - Monotonically increasing
# - Ranges from 0 to 1

x = np.linspace(-4, 4, 1000)
pdf = stats.norm.pdf(x)
cdf = stats.norm.cdf(x)

fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# PDF with area
axes[0].plot(x, pdf, 'b-', linewidth=2)
axes[0].fill_between(x, pdf, where=(x >= -1) & (x <= 1), alpha=0.5, label='P(-1 โ‰ค X โ‰ค 1)')
axes[0].set_title('PDF: P(-1 โ‰ค X โ‰ค 1) โ‰ˆ 0.68')
axes[0].legend()

# CDF
axes[1].plot(x, cdf, 'r-', linewidth=2)
axes[1].axhline(y=0.84, color='g', linestyle='--', label='P(X โ‰ค 1)')
axes[1].axvline(x=1, color='g', linestyle='--')
axes[1].set_title('CDF: P(X โ‰ค 1)')
axes[1].legend()

plt.tight_layout()
plt.show()

# Calculating probabilities
# P(a โ‰ค X โ‰ค b) = CDF(b) - CDF(a)
p_between = stats.norm.cdf(1) - stats.norm.cdf(-1)
print(f"P(-1 โ‰ค X โ‰ค 1): {p_between:.4f}")

# P(X > x) = 1 - CDF(x)
p_greater = 1 - stats.norm.cdf(2)
print(f"P(X > 2): {p_greater:.4f}")

# P(X < x) = CDF(x)
p_less = stats.norm.cdf(-1)
print(f"P(X < -1): {p_less:.4f}")

๐Ÿ’ก Computing Probabilities with CDFs

For any continuous distribution, compute interval probabilities using the CDF:

  • P(a โ‰ค X โ‰ค b) = F(b) - F(a)
  • P(X > a) = 1 - F(a)
  • P(X โ‰ค a) = F(a) This works because the CDF encodes all probabilistic information about the distribution.

Bayes' Theorem

P(AโˆฃB)=P(BโˆฃA)ร—P(A)P(B)P(A|B) = \frac{P(B|A) \times P(A)}{P(B)}

Intuitive Understanding

Architecture Diagram
Bayes' Theorem:

P(Hypothesis | Data) = P(Data | Hypothesis) ร— P(Hypothesis)
                       โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
                                    P(Data)

Posterior = Likelihood ร— Prior
            โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
               Evidence

Bayes' Theorem (Expanded)

P(AโˆฃB)=P(BโˆฃA)โ‹…P(A)P(BโˆฃA)โ‹…P(A)+P(BโˆฃยฌA)โ‹…P(ยฌA)P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B|A) \cdot P(A) + P(B|\neg A) \cdot P(\neg A)}

Here,

  • P(AโˆฃB)P(A|B)=Posterior: probability of hypothesis given evidence
  • P(BโˆฃA)P(B|A)=Likelihood: probability of evidence given hypothesis
  • P(A)P(A)=Prior: initial belief in hypothesis
  • P(BโˆฃยฌA)P(B|ยฌA)=False positive rate
  • P(ยฌA)P(ยฌA)=Prior probability of complement

Example: Medical Testing

# Medical test example
# Disease prevalence: 1%
# Test sensitivity (true positive rate): 99%
# Test specificity (true negative rate): 95%

# Given: Person tests positive
# Find: Probability they actually have the disease

# Prior: P(Disease) = 0.01
# Likelihood: P(Positive | Disease) = 0.99
# Evidence: P(Positive) = ?

# Using law of total probability
p_disease = 0.01
p_positive_given_disease = 0.99
p_positive_given_healthy = 0.05  # False positive rate = 1 - specificity

# P(Positive) = P(Positive|Disease)ร—P(Disease) + P(Positive|Healthy)ร—P(Healthy)
p_positive = (p_positive_given_disease * p_disease + 
              p_positive_given_healthy * (1 - p_disease))

# Apply Bayes' theorem
p_disease_given_positive = (p_positive_given_disease * p_disease) / p_positive

print("Medical Test Analysis:")
print(f"P(Disease): {p_disease:.2%}")
print(f"P(Positive|Disease): {p_positive_given_disease:.2%}")
print(f"P(Positive|Healthy): {p_positive_given_healthy:.2%}")
print(f"P(Positive): {p_positive:.4f}")
print(f"\nP(Disease|Positive): {p_disease_given_positive:.4f}")
print(f"P(Disease|Positive): {p_disease_given_positive:.2%}")

# Visualization
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Tree diagram
categories = ['Disease\n(1%)', 'Healthy\n(99%)']
true_pos = 0.01 * 0.99
false_pos = 0.99 * 0.05
true_neg = 0.99 * 0.95
false_neg = 0.01 * 0.01

values = [true_pos, false_pos, true_neg, false_neg]
labels = ['True Pos', 'False Pos', 'True Neg', 'False Neg']
colors = ['#FF6B6B', '#FFEAA7', '#4ECDC4', '#45B7D1']

axes[0].pie(values, labels=labels, colors=colors, autopct='%1.2f%%')
axes[0].set_title('Test Results Breakdown')

# Bayesian update visualization
prior = 0.01
likelihood_ratio = 0.99 / 0.05
posterior = (likelihood_ratio * prior) / (likelihood_ratio * prior + (1 - prior))

axes[1].bar(['Prior', 'Posterior'], [prior, posterior], color=['#4ECDC4', '#FF6B6B'])
axes[1].set_title('Bayesian Update: Prior โ†’ Posterior')
axes[1].set_ylabel('Probability')
axes[1].set_ylim(0, 0.5)

plt.tight_layout()
plt.show()

๐Ÿ“Worked Example: Medical Test (Bayes' Theorem)

Given: Disease prevalence = 1%, Sensitivity = 99%, Specificity = 95%

Step 1: Identify the quantities

  • P(Disease) = 0.01 (prior)
  • P(Positive | Disease) = 0.99 (sensitivity/likelihood)
  • P(Positive | Healthy) = 1 - 0.95 = 0.05 (false positive rate)

Step 2: Compute evidence P(Positive) P(Positive) = 0.99 ร— 0.01 + 0.05 ร— 0.99 = 0.0099 + 0.0495 = 0.0594

Step 3: Apply Bayes' Theorem P(Disease | Positive) = 0.0099 / 0.0594 โ‰ˆ 16.67%

Interpretation: Despite the test being "99% accurate," only ~17% of positive results are true positives. This is the base rate fallacy โ€” ignoring the low prevalence leads to dramatically overestimating the probability of disease.

Naive Bayes Classification

# Example: Spam classification
# Prior: P(Spam) = 0.3, P(Ham) = 0.7
# Likelihood: P(Word | Spam), P(Word | Ham)

# Vocabulary: ['free', 'money', 'meeting', 'project']
p_spam = 0.3
p_ham = 0.7

# Likelihoods
likelihoods = {
    'free': {'spam': 0.8, 'ham': 0.1},
    'money': {'spam': 0.7, 'ham': 0.05},
    'meeting': {'spam': 0.1, 'ham': 0.6},
    'project': {'spam': 0.2, 'ham': 0.5}
}

# Email contains: 'free', 'money', 'meeting'
words = ['free', 'money', 'meeting']

# Calculate likelihoods (assuming independence)
p_words_given_spam = np.prod([likelihoods[w]['spam'] for w in words])
p_words_given_ham = np.prod([likelihoods[w]['ham'] for w in words])

# Apply Bayes' theorem
p_words = p_words_given_spam * p_spam + p_words_given_ham * p_ham
p_spam_given_words = (p_words_given_spam * p_spam) / p_words
p_ham_given_words = (p_words_given_ham * p_ham) / p_words

print("Spam Classification:")
print(f"P(Spam): {p_spam}")
print(f"P(Ham): {p_ham}")
print(f"P(Words|Spam): {p_words_given_spam:.6f}")
print(f"P(Words|Ham): {p_words_given_ham:.6f}")
print(f"\nP(Spam|Words): {p_spam_given_words:.4f}")
print(f"P(Ham|Words): {p_ham_given_words:.4f}")
print(f"Classification: {'Spam' if p_spam_given_words > p_ham_given_words else 'Ham'}")

# Visualization
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Prior vs Posterior
axes[0].bar(['P(Spam)', 'P(Ham)'], [p_spam, p_ham], color=['#FF6B6B', '#4ECDC4'], 
            label='Prior', alpha=0.6)
axes[0].bar(['P(Spam)', 'P(Ham)'], [p_spam_given_words, p_ham_given_words], 
            color=['#FF6B6B', '#4ECDC4'], label='Posterior', alpha=0.9)
axes[0].set_title('Prior vs Posterior')
axes[0].legend()

# Likelihood comparison
word_labels = list(likelihoods.keys())
spam_likelihoods = [likelihoods[w]['spam'] for w in word_labels]
ham_likelihoods = [likelihoods[w]['ham'] for w in word_labels]

x = np.arange(len(word_labels))
width = 0.35
axes[1].bar(x - width/2, spam_likelihoods, width, label='Spam', color='#FF6B6B')
axes[1].bar(x + width/2, ham_likelihoods, width, label='Ham', color='#4ECDC4')
axes[1].set_xticks(x)
axes[1].set_xticklabels(word_labels)
axes[1].set_title('Word Likelihoods')
axes[1].legend()

plt.tight_layout()
plt.show()

โ„น๏ธ The 'Naive' Assumption

Naive Bayes assumes features are conditionally independent given the class label: P(xโ‚, xโ‚‚, ..., xโ‚™ | C) = โˆแตข P(xแตข | C). This assumption is rarely true in practice, yet Naive Bayes performs remarkably well for text classification because (1) the ranking of posterior probabilities is often correct even if the absolute values are not, and (2) the independence assumption acts as regularization.

Central Limit Theorem (CLT)

The CLT states that the sampling distribution of the sample mean approaches a normal distribution as sample size increases, regardless of the population distribution.

Xห‰โˆผN(ฮผ,ฯƒ2n)ย asย nโ†’โˆž\bar{X} \sim N\left(\mu, \frac{\sigma^2}{n}\right) \text{ as } n \to \infty
# Demonstrate CLT with different distributions
np.random.seed(42)

distributions = {
    'Uniform': np.random.uniform(0, 10, 10000),
    'Exponential': np.random.exponential(2, 10000),
    'Chi-Square': np.random.chisquare(2, 10000),
    'Bimodal': np.concatenate([np.random.normal(-2, 1, 5000), 
                                np.random.normal(2, 1, 5000)])
}

sample_sizes = [1, 5, 10, 30]

fig, axes = plt.subplots(len(distributions), len(sample_sizes), 
                         figsize=(16, 12))

for i, (dist_name, data) in enumerate(distributions.items()):
    for j, n in enumerate(sample_sizes):
        # Take multiple samples and calculate means
        sample_means = [np.mean(np.random.choice(data, n)) for _ in range(1000)]
        
        axes[i, j].hist(sample_means, bins=30, density=True, 
                       alpha=0.7, edgecolor='black')
        axes[i, j].set_title(f'{dist_name}\nn={n}')
        
        if j == 0:
            axes[i, j].set_ylabel('Density')
        if i == len(distributions) - 1:
            axes[i, j].set_xlabel('Sample Mean')

plt.suptitle('Central Limit Theorem Demonstration', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

# CLT in action: Confidence intervals
population = np.random.exponential(2, 100000)
pop_mean = np.mean(population)
pop_std = np.std(population)

print(f"Population Mean: {pop_mean:.4f}")
print(f"Population Std: {pop_std:.4f}")

# Different sample sizes
sample_sizes = [10, 30, 50, 100]
n_simulations = 1000

for n in sample_sizes:
    # Simulate sampling
    sample_means = [np.mean(np.random.choice(population, n)) for _ in range(n_simulations)]
    
    # 95% Confidence interval
    ci_lower = np.percentile(sample_means, 2.5)
    ci_upper = np.percentile(sample_means, 97.5)
    
    print(f"\nn={n}:")
    print(f"  Sample Mean: {np.mean(sample_means):.4f}")
    print(f"  Sample Std: {np.std(sample_means):.4f}")
    print(f"  Expected Std (ฯƒ/โˆšn): {pop_std/np.sqrt(n):.4f}")
    print(f"  95% CI: [{ci_lower:.4f}, {ci_upper:.4f}]")
    print(f"  CI contains population mean: {ci_lower <= pop_mean <= ci_upper}")

๐Ÿ’ก Rules of Thumb for CLT

  • n โ‰ฅ 30: CLT approximation is generally reasonable for most distributions
  • n โ‰ฅ 50: Good approximation even for moderately skewed distributions
  • n โ‰ฅ 100: Excellent approximation for nearly all distributions
  • For heavily skewed or heavy-tailed distributions, larger samples are needed
  • The CLT applies to means; analogous results exist for sums, proportions, and other statistics

Hypothesis Testing

# One-sample t-test
from scipy import stats

# Sample data
np.random.seed(42)
sample = np.random.normal(10.5, 2, 100)  # Sample from population with mean 10.5

# Null hypothesis: ฮผ = 10
# Alternative hypothesis: ฮผ โ‰  10
t_stat, p_value = stats.ttest_1samp(sample, 10)

print("One-Sample t-test:")
print(f"Sample Mean: {np.mean(sample):.4f}")
print(f"T-statistic: {t_stat:.4f}")
print(f"P-value: {p_value:.4f}")
print(f"Result: {'Reject' if p_value < 0.05 else 'Fail to reject'} H0")

# Two-sample t-test
sample1 = np.random.normal(10, 2, 100)
sample2 = np.random.normal(11, 2, 100)

t_stat, p_value = stats.ttest_ind(sample1, sample2)
print("\nTwo-Sample t-test:")
print(f"Sample 1 Mean: {np.mean(sample1):.4f}")
print(f"Sample 2 Mean: {np.mean(sample2):.4f}")
print(f"T-statistic: {t_stat:.4f}")
print(f"P-value: {p_value:.4f}")

# Paired t-test
before = np.random.normal(100, 15, 50)
after = before + np.random.normal(5, 10, 50)

t_stat, p_value = stats.ttest_rel(before, after)
print("\nPaired t-test:")
print(f"Mean Difference: {np.mean(after - before):.4f}")
print(f"T-statistic: {t_stat:.4f}")
print(f"P-value: {p_value:.4f}")

# Chi-square test
# Test independence between two categorical variables
observed = np.array([[50, 30, 20],
                     [35, 45, 20],
                     [15, 25, 60]])

chi2, p_value, dof, expected = stats.chi2_contingency(observed)
print("\nChi-Square Test:")
print(f"Chi-square statistic: {chi2:.4f}")
print(f"P-value: {p_value:.4f}")
print(f"Degrees of freedom: {dof}")

โ„น๏ธ Common Misconceptions About P-values

  • A p-value is NOT the probability that Hโ‚€ is true
  • A p-value is NOT the probability the result occurred by chance
  • A p-value IS the probability of observing a test statistic as extreme or more extreme than the one observed, assuming Hโ‚€ is true
  • A non-significant result (p > 0.05) does NOT prove Hโ‚€ โ€” it means insufficient evidence to reject

Practical Example: A/B Testing with Bayesian Analysis

import numpy as np
from scipy import stats

# A/B test data
np.random.seed(42)

# Control group (A)
n_control = 1000
conversions_control = 120
conversion_rate_control = conversions_control / n_control

# Treatment group (B)
n_treatment = 1000
conversions_treatment = 150
conversion_rate_treatment = conversions_treatment / n_treatment

print("A/B Test Results:")
print(f"Control: {conversions_control}/{n_control} = {conversion_rate_control:.2%}")
print(f"Treatment: {conversions_treatment}/{n_treatment} = {conversion_rate_treatment:.2%}")
print(f"Lift: {(conversion_rate_treatment/conversion_rate_control - 1)*100:.1f}%")

# Frequentist approach
from statsmodels.stats.proportion import proportions_ztest

count = np.array([conversions_control, conversions_treatment])
nobs = np.array([n_control, n_treatment])

z_stat, p_value = proportions_ztest(count, nobs)
print(f"\nFrequentist Analysis:")
print(f"Z-statistic: {z_stat:.4f}")
print(f"P-value: {p_value:.4f}")
print(f"Significant at ฮฑ=0.05: {p_value < 0.05}")

# Bayesian approach
# Prior: Beta(1, 1) (uniform prior)
# Posterior: Beta(conversions + 1, non-conversions + 1)

prior_a = 1
prior_b = 1

posterior_control = stats.beta(
    conversions_control + prior_a,
    n_control - conversions_control + prior_b
)

posterior_treatment = stats.beta(
    conversions_treatment + prior_a,
    n_treatment - conversions_treatment + prior_b
)

# Sample from posteriors
n_samples = 10000
samples_control = posterior_control.rvs(n_samples)
samples_treatment = posterior_treatment.rvs(n_samples)

# Probability that B > A
prob_b_better = np.mean(samples_treatment > samples_control)
print(f"\nBayesian Analysis:")
print(f"P(B > A): {prob_b_better:.2%}")

# Expected lift
lift = (samples_treatment - samples_control) / samples_control
print(f"Expected Lift: {np.mean(lift)*100:.1f}% ยฑ {np.std(lift)*100:.1f}%")

# Visualization
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Posterior distributions
x = np.linspace(0.08, 0.20, 1000)
axes[0].plot(x, posterior_control.pdf(x), 'b-', label='Control', linewidth=2)
axes[0].plot(x, posterior_treatment.pdf(x), 'r-', label='Treatment', linewidth=2)
axes[0].fill_between(x, posterior_control.pdf(x), alpha=0.3)
axes[0].fill_between(x, posterior_treatment.pdf(x), alpha=0.3)
axes[0].set_title('Posterior Distributions')
axes[0].set_xlabel('Conversion Rate')
axes[0].set_ylabel('Density')
axes[0].legend()

# Lift distribution
axes[1].hist(lift * 100, bins=50, density=True, alpha=0.7, edgecolor='black')
axes[1].axvline(x=0, color='r', linestyle='--', label='No Effect')
axes[1].axvline(x=np.mean(lift)*100, color='g', linestyle='-', 
                label=f'Mean: {np.mean(lift)*100:.1f}%')
axes[1].set_title('Lift Distribution (B vs A)')
axes[1].set_xlabel('Relative Improvement (%)')
axes[1].set_ylabel('Density')
axes[1].legend()

plt.tight_layout()
plt.show()

๐Ÿ“Worked Example: Bayesian A/B Test

Given: Control: 120/1000 conversions, Treatment: 150/1000 conversions

Step 1: Define priors โ€” Beta(1,1) (uniform/non-informative)

Step 2: Compute posteriors

  • Posterior(Control) = Beta(120 + 1, 880 + 1) = Beta(121, 881)
  • Posterior(Treatment) = Beta(150 + 1, 850 + 1) = Beta(151, 851)

Step 3: Sample and compare

  • Draw 10,000 samples from each posterior
  • P(Treatment > Control) โ‰ˆ 98.2%
  • Expected lift: ~25% with 95% credible interval

Decision: Strong Bayesian evidence that Treatment outperforms Control. The Bayesian approach gives a direct probability of improvement, unlike frequentist p-values.

Key Takeaways

๐Ÿ“‹Summary: Probability, Bayes, PDF/CDF, CLT

  1. Probability provides a rigorous axiomatic framework for quantifying uncertainty โ€” P(A) โ‰ฅ 0, P(ฮฉ) = 1, countable additivity
  2. PDF describes density (area = probability); CDF gives cumulative probability directly โ€” use CDFs for interval calculations
  3. Bayes' Theorem enables principled belief updating: Posterior โˆ Likelihood ร— Prior. Watch for the base rate fallacy
  4. CLT guarantees the sampling distribution of the mean is approximately normal for large n โ€” this underpins confidence intervals and hypothesis tests regardless of population shape
  5. Hypothesis testing provides a framework for making decisions from data โ€” understand what p-values actually mean (and don't mean)
  6. Discrete distributions (Binomial, Poisson) model counts; continuous distributions (Normal, Exponential, Beta) model measurements
  7. Naive Bayes assumes conditional independence โ€” rarely true but often effective for classification

Practice Exercise

  1. Calculate the probability of getting exactly 3 heads in 5 fair coin flips using the Binomial distribution
  2. Apply Bayes' Theorem to a drug testing scenario: 2% prevalence, 95% sensitivity, 90% specificity โ€” what is P(Disease | Positive)?
  3. Demonstrate CLT by sampling from an exponential distribution with n = 2, 10, 30, 100 and plotting the sampling distribution of the mean
  4. Perform a one-sample t-test and a two-sample t-test on generated data
  5. Build a Bayesian A/B testing framework that computes P(B > A) for different sample sizes
  6. For a Poisson process with ฮป = 3 events/hour, compute the probability of observing 0, 1, 2, 3, 4, 5+ events in an hour
  7. Compare the PDF and CDF of the normal distribution visually and verify that P(ฮผ - ฯƒ โ‰ค X โ‰ค ฮผ + ฯƒ) โ‰ˆ 0.6827

Advertisement

Need Expert Data Science Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement