Survival Analysis — Time-to-Event Data

Free Lesson

Advertisement

Survival Analysis

Survival analysis analyzes time until an event occurs (death, failure, relapse). It handles censored data — subjects who haven't experienced the event by the study end.

Key functions:

  • S(t)=P(T>t)S(t) = P(T > t) — survival function (probability of surviving past time t)
  • h(t)h(t) — hazard rate (instantaneous risk at time t)
  • H(t)=logS(t)H(t) = -\log S(t) — cumulative hazard
import numpy as np
import pandas as pd
from lifelines import KaplanMeierFitter, CoxPHFitter
from lifelines.statistics import logrank_test
import matplotlib.pyplot as plt

np.random.seed(42)
n = 200

# Simulate clinical trial: two treatment groups
# Group A (control): exponential survival, median = 12 months
# Group B (treatment): longer survival, median = 20 months
group = np.random.choice([0, 1], n)
true_median = np.where(group == 0, 12, 20)
duration = np.random.exponential(true_median/np.log(2))
censored_at = 24  # study ends at 24 months
observed = duration <= censored_at
duration_obs = np.minimum(duration, censored_at)

df = pd.DataFrame({
    'duration': duration_obs,
    'event': observed.astype(int),
    'group': np.where(group==0, 'Control', 'Treatment'),
    'age': np.random.uniform(40, 70, n),
    'stage': np.random.choice([1,2,3], n, p=[0.3,0.5,0.2])
})

# Kaplan-Meier estimator
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

kmf = KaplanMeierFitter()
for grp, color in [('Control','red'), ('Treatment','blue')]:
    mask = df['group'] == grp
    kmf.fit(df[mask]['duration'], df[mask]['event'], label=grp)
    kmf.plot_survival_function(ax=axes[0], color=color)

axes[0].set_title('Kaplan-Meier Survival Curves')
axes[0].set_xlabel('Time (months)')
axes[0].set_ylabel('Survival Probability')

# Log-rank test
ctrl = df[df['group']=='Control']
trt  = df[df['group']=='Treatment']
result = logrank_test(ctrl['duration'], trt['duration'],
                       ctrl['event'], trt['event'])
axes[0].text(0.05, 0.1, f'Log-rank p = {result.p_value:.4f}',
             transform=axes[0].transAxes)

# Cox proportional hazards model
cph = CoxPHFitter()
cph.fit(df[['duration','event','group','age','stage']],
        duration_col='duration', event_col='event')
cph.print_summary()
cph.plot(ax=axes[1])
axes[1].set_title('Cox Model: Hazard Ratios')

plt.tight_layout()
plt.savefig('survival_analysis.png', dpi=150)
plt.show()

# Median survival times
for grp in ['Control','Treatment']:
    mask = df['group'] == grp
    kmf.fit(df[mask]['duration'], df[mask]['event'])
    median = kmf.median_survival_time_
    print(f"{grp}: median survival = {median:.1f} months")

Key Takeaways

  1. Censored observations are not missing — they carry information (survived at least until censoring)
  2. Kaplan-Meier is a nonparametric estimator of the survival function
  3. Log-rank test compares survival curves between groups
  4. Cox proportional hazards model estimates hazard ratios adjusting for covariates
  5. Hazard Ratio < 1: treatment reduces hazard; > 1: treatment increases hazard

Advertisement

Need Expert Statistics Help?

Get personalized tutoring, dissertation support, or statistical consulting.

Advertisement