Odds Ratios — Understanding and Interpreting ORs

Regression AnalysisLogistic RegressionFree Lesson

Advertisement

Odds Ratios

The odds ratio (OR) measures the association between a binary predictor and a binary outcome. It is the ratio of two odds.

OR=p1/(1p1)p2/(1p2)OR = \frac{p_1/(1-p_1)}{p_2/(1-p_2)}

import numpy as np
import pandas as pd
from scipy import stats

# 2×2 contingency table
# Smoking vs Heart Disease
data = np.array([[80, 120],   # Smokers: disease, no disease
                 [30, 270]])  # Non-smokers: disease, no disease

smoker_odds = data[0,0] / data[0,1]
nonsmoker_odds = data[1,0] / data[1,1]
OR = smoker_odds / nonsmoker_odds

print("Smoking and Heart Disease:")
print(f"  Smokers: {data[0,0]} disease, {data[0,1]} no disease → odds = {smoker_odds:.3f}")
print(f"  Non-smokers: {data[1,0]} disease, {data[1,1]} no disease → odds = {nonsmoker_odds:.3f}")
print(f"  Odds Ratio = {OR:.3f}")
print(f"  Smokers have {OR:.1f}× the odds of heart disease vs non-smokers")

# 95% CI for OR (log-method)
log_OR = np.log(OR)
SE_log_OR = np.sqrt(sum(1/x for x in data.flatten()))
CI_lower = np.exp(log_OR - 1.96*SE_log_OR)
CI_upper = np.exp(log_OR + 1.96*SE_log_OR)
print(f"  95% CI: ({CI_lower:.3f}, {CI_upper:.3f})")

# Fisher's exact test
oddsratio_fisher, p_fisher = stats.fisher_exact(data)
print(f"  Fisher's exact p-value: {p_fisher:.6f}")

# OR from logistic regression
import statsmodels.api as sm
np.random.seed(42)
n = 500
smoking = np.random.binomial(1, 0.4, n)
heart_disease = np.random.binomial(1, 0.1 + 0.2*smoking)

X = sm.add_constant(smoking)
logit_model = sm.Logit(heart_disease, X).fit(disp=False)
or_logit = np.exp(logit_model.params['x1'])
ci_logit = np.exp(logit_model.conf_int().loc['x1'])
print(f"\nOR from logistic regression: {or_logit:.3f}")
print(f"95% CI: ({ci_logit[0]:.3f}, {ci_logit[1]:.3f})")

# OR vs Risk Ratio
p1 = data[0,0] / data[0].sum()
p2 = data[1,0] / data[1].sum()
RR = p1 / p2
print(f"\nOR = {OR:.3f}, Risk Ratio (RR) = {RR:.3f}")
print("OR overestimates effect when outcome is common (>10%)")
print("Use RR for cohort studies, OR for case-control studies")

Key Takeaways

  1. OR = 1: no association; OR > 1: positive association; OR < 1: negative association
  2. OR ≈ RR only when the outcome is rare (<10%)
  3. Log(OR) from logistic regression gives the coefficient
  4. 95% CI not including 1 means statistically significant association
  5. Case-control studies use OR; cohort/RCT studies can use either RR or OR

Advertisement

Need Expert Statistics Help?

Get personalized tutoring, dissertation support, or statistical consulting.

Advertisement