Survival Analysis
Survival analysis analyzes time until an event occurs (death, failure, relapse). It handles censored data — subjects who haven't experienced the event by the study end.
Key functions:
— survival function (probability of surviving past time t)— hazard rate (instantaneous risk at time t)— cumulative hazard
import numpy as np
import pandas as pd
from lifelines import KaplanMeierFitter, CoxPHFitter
from lifelines.statistics import logrank_test
import matplotlib.pyplot as plt
np.random.seed(42)
n = 200
# Simulate clinical trial: two treatment groups
# Group A (control): exponential survival, median = 12 months
# Group B (treatment): longer survival, median = 20 months
group = np.random.choice([0, 1], n)
true_median = np.where(group == 0, 12, 20)
duration = np.random.exponential(true_median/np.log(2))
censored_at = 24 # study ends at 24 months
observed = duration <= censored_at
duration_obs = np.minimum(duration, censored_at)
df = pd.DataFrame({
'duration': duration_obs,
'event': observed.astype(int),
'group': np.where(group==0, 'Control', 'Treatment'),
'age': np.random.uniform(40, 70, n),
'stage': np.random.choice([1,2,3], n, p=[0.3,0.5,0.2])
})
# Kaplan-Meier estimator
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
kmf = KaplanMeierFitter()
for grp, color in [('Control','red'), ('Treatment','blue')]:
mask = df['group'] == grp
kmf.fit(df[mask]['duration'], df[mask]['event'], label=grp)
kmf.plot_survival_function(ax=axes[0], color=color)
axes[0].set_title('Kaplan-Meier Survival Curves')
axes[0].set_xlabel('Time (months)')
axes[0].set_ylabel('Survival Probability')
# Log-rank test
ctrl = df[df['group']=='Control']
trt = df[df['group']=='Treatment']
result = logrank_test(ctrl['duration'], trt['duration'],
ctrl['event'], trt['event'])
axes[0].text(0.05, 0.1, f'Log-rank p = {result.p_value:.4f}',
transform=axes[0].transAxes)
# Cox proportional hazards model
cph = CoxPHFitter()
cph.fit(df[['duration','event','group','age','stage']],
duration_col='duration', event_col='event')
cph.print_summary()
cph.plot(ax=axes[1])
axes[1].set_title('Cox Model: Hazard Ratios')
plt.tight_layout()
plt.savefig('survival_analysis.png', dpi=150)
plt.show()
# Median survival times
for grp in ['Control','Treatment']:
mask = df['group'] == grp
kmf.fit(df[mask]['duration'], df[mask]['event'])
median = kmf.median_survival_time_
print(f"{grp}: median survival = {median:.1f} months")
Key Takeaways
- Censored observations are not missing — they carry information (survived at least until censoring)
- Kaplan-Meier is a nonparametric estimator of the survival function
- Log-rank test compares survival curves between groups
- Cox proportional hazards model estimates hazard ratios adjusting for covariates
- Hazard Ratio < 1: treatment reduces hazard; > 1: treatment increases hazard