Sampling Bias and Errors
Even the most carefully designed study can be undermined by bias. Understanding bias is not pessimism — it's statistical integrity.
Types of Error in Statistics
Sampling Error
The unavoidable random difference between a sample statistic and the true population parameter. It decreases with larger samples and can be quantified.
This is expected and manageable. Confidence intervals are designed to account for it.
Non-Sampling Error (Bias)
Systematic errors that don't go away with larger samples. A biased survey of 1 million people is still biased.
Types of Bias
Selection Bias
Certain individuals are more likely to be included in the sample due to how the sample was drawn.
Classic example: The Literary Digest 1936 US Election Poll
- Sent 10 million surveys to car owners and phone subscribers
- Got 2.4 million responses
- Predicted Landon would beat Roosevelt 57% to 43%
- Roosevelt won 62% to 37%
Problem: In 1936, car owners and phone subscribers were wealthy — systematically more Republican than the general population.
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(42)
n = 10000
# True population: 55% support policy X
true_support = 0.55
population_support = np.random.binomial(1, true_support, n)
population_income = np.random.normal(50000, 20000, n) # income in USD
# Selection bias: online survey — higher income people more likely to respond
prob_respond = np.clip(0.1 + (population_income - 30000) / 200000, 0.05, 0.95)
responded = np.random.binomial(1, prob_respond)
# High-income people support policy less (e.g., a wealth tax)
true_support_by_income = np.where(population_income > 60000, 0.35, 0.65)
population_support = np.random.binomial(1, true_support_by_income)
selected = population_support[responded == 1]
print(f"True support: {true_support_by_income.mean():.3f}")
print(f"Biased sample support: {selected.mean():.3f}")
print(f"Bias: {selected.mean() - true_support_by_income.mean():.3f}")
Nonresponse Bias
People who don't respond to a survey differ systematically from those who do.
Examples:
- Happy customers ignore feedback surveys; dissatisfied customers respond
- Busy people (who may have different characteristics) skip phone surveys
- Sensitive topics (income, drug use) get more refusals from those actually affected
Detection:
# Compare early vs late respondents (late respondents ≈ nonrespondents)
# (Heckman's correction technique)
# Simulate survey with nonresponse
n = 1000
true_job_satisfaction = np.random.normal(6.5, 2, n) # 1-10 scale
# Dissatisfied people less likely to respond
prob_respond = np.clip(0.3 + 0.08 * true_job_satisfaction, 0.1, 0.95)
responded = np.random.binomial(1, prob_respond)
all_responses = true_job_satisfaction[responded == 1]
print(f"True mean satisfaction: {true_job_satisfaction.mean():.3f}")
print(f"Survey mean satisfaction (biased): {all_responses.mean():.3f}")
print(f"Nonresponse bias: +{all_responses.mean() - true_job_satisfaction.mean():.3f}")
print("Survey overestimates satisfaction because unhappy people didn't respond!")
Survivorship Bias
Analyzing only survivors (successes) while ignoring those that failed and are no longer visible.
Examples:
- Studying successful startups to find success strategies (ignoring the thousands that failed with the same strategies)
- WWII plane damage example: reinforce the planes that returned, not where they were hit
- Investment fund performance: funds that failed were removed from databases
# Survivorship bias in investment funds
np.random.seed(7)
n_funds = 1000
years = 10
# Each fund has a 70% chance of surviving each year
survival = np.random.binomial(1, 0.70, (n_funds, years)).cumprod(axis=1)
# Simulate annual returns: mean 5% with 20% std dev
returns = np.random.normal(0.05, 0.20, (n_funds, years))
# Surviving funds (still alive at year 10)
survived = survival[:, -1] == 1
n_survived = survived.sum()
print(f"Funds surviving 10 years: {n_survived}/{n_funds} ({100*n_survived/n_funds:.1f}%)")
# Compare average returns
all_fund_returns = returns.mean(axis=1)
survivor_returns = returns[survived].mean(axis=1)
print(f"All funds average return: {all_fund_returns.mean():.3f}")
print(f"Surviving funds average return: {survivor_returns.mean():.3f}")
print(f"Survivorship bias inflates returns by: {survivor_returns.mean() - all_fund_returns.mean():.3f}")
Measurement Bias
Systematic errors in how variables are measured, causing values to be consistently too high or too low.
Examples:
- Self-reported weight: people tend to underreport
- Social desirability bias: answering to appear socially acceptable
- Question ordering effects
- Interviewer effects (different interviewers get different answers)
# Social desirability bias: hours of exercise per week
np.random.seed(3)
n = 500
true_hours = np.random.exponential(scale=3, size=n) # true behavior
# People exaggerate to the interviewer
exaggeration = np.random.normal(loc=1.5, scale=0.5, size=n) # add ~1.5 hours bias
reported_hours = true_hours + exaggeration
print(f"True mean exercise: {true_hours.mean():.2f} hours/week")
print(f"Reported mean exercise: {reported_hours.mean():.2f} hours/week")
print(f"Measurement bias: +{reported_hours.mean() - true_hours.mean():.2f} hours")
Detecting and Reducing Bias
| Bias Type | Detection | Reduction |
|---|---|---|
| Selection | Compare selected vs. population on known variables | Probability sampling, weighting |
| Nonresponse | Follow-up of nonrespondents, callback analysis | Maximize response rate, imputation |
| Survivorship | Check if missing data is random | Include all units, censoring analysis |
| Measurement | Validate against objective measures | Anonymize surveys, indirect questions |
Key Takeaways
- Sampling error is random and decreases with n — it's manageable
- Bias is systematic and doesn't decrease with n — bigger biased samples are still biased
- Survivorship bias is everywhere in business, medicine, and social science
- Nonresponse bias is often larger than sampling error in practice
- Probability sampling (not convenience sampling) is the best protection against selection bias
- Validate your measures — what you measure may not be what you intend to measure