# Packages
library(tidyverse)
library(survival)
library(survminer)
# Data
## Get data
data(cancer, package = "survival")
## Transform and select data
data_f <- lung %>%
dplyr::select(-c(ph.ecog, ph.karno, pat.karno, meal.cal, wt.loss)) %>%
dplyr::mutate(sex = if_else(sex == 1, "Male", "Female")) %>%
dplyr::mutate_at(.vars = c("inst", "status", "sex"), .funs = as.factor)
# Model
# * status == 2: event, status == 1: censored
cox_fit <- survival::coxph(survival::Surv(time, status == 2) ~ age + sex, data_f, ties = "breslow")
## Coefficients
broom::tidy(cox_fit)
survival::cox.zph(cox_fit) # Test if covariate HR changes over time
# Plot
plot(survival::cox.zph(cox_fit)) # See much it would change over time (when y is constant, it would not change)
survminer::ggsurvplot(survival::survfit(cox_fit), conf.int = TRUE, data = data_f)13 Cox Regression
$$ % Basic sets/Variables
% Probability and statistics % % Linear algebra % Math functions % Distributions% Update symbols $$
The Cox proportional hazards model is one of the most widely used methods for analyzing time-to-event data. Unlike parametric survival models that require specifying the form of the baseline hazard function (see Chapter 9), the Cox model is a semi-parametric approach that leaves the baseline hazard unspecified while modeling the effects of covariates on the hazard function.
The Cox model does not make assumptions about the shape of the survival distribution, only requiring the proportional hazards assumption. This assumption states that the hazard ratio between any two individuals is constant over time - their hazard functions are proportional.
The Cox proportional hazards model is defined as: \[ h(t) = h_0(t) \cdot \operatorname{exp}\left(\sum_{i=1}^p \beta_i x_i\right) \tag{13.1}\] where \(h_0\) is the baseline hazard function, \(x_1, x_2, \dots, x_p\) are the covariates and \(\beta_1, \beta_2, \dots, \beta_p\) the associated effect sizes of the covariates.
The Cox model assumes that continuous covariates have a linear relationship with \(h\).
13.1 Separation
Similar to logistic regression (see Chapter 4), the Cox model can experience separation. However, as opposed to logistic regression, separation in the Cox model only occurs when all observations within a level of a covariate are censored. This can have some of the following implications:
Estimation Instability
The maximum likelihood estimation becomes unstable because there is no information about the hazard for the separated covariate level. The model may produce extremely large coefficient estimates with correspondingly large standard errors, making the results unreliable.
Convergence Issues
The iterative estimation algorithm (typically Newton-Raphson) may fail to converge or take an excessive number of iterations.
Invalid Inference
Confidence intervals, p-values, and hypothesis tests become meaningless for the separated covariate.
13.2 Stratification
When stratifying a Cox model (see Equation 13.1), it allows the model to fix different baseline hazard functions \(h_0\) for each stratum (each level of the stratification variable). Thus, if there are two stratum, \(g=1\) and \(g=2\), the Cox model is given as \[ h(t) = \begin{cases} h_{0, g = 1}(t) \cdot \operatorname{exp}\left(\sum_{i = 1}^p \beta_i x_i\right) & \textrm{if } g = 1\\ h_{0, g = 2}(t) \cdot \operatorname{exp}\left(\sum_{i = 1}^p \beta_i x_i\right) & \textrm{if } g = 2 \end{cases} \tag{13.2}\] meaning that the two stratum share the same effect sizes \(\beta_1, \beta_2, \dots, \beta_p\). On the contrary, if you were to not stratify the model, and instead fit two seperate models, one for each stratum in the provided example, then this would not neccesarrily be the case.
Example 13.1 (Proportional Cox model)
# Packages
import pandas as pd
import lifelines
# Data
# Model
# * status == 1: event, status == 0: censored
cph = CoxPHFitter()
cph.fit(data, duration_col = 'time', event_col = 'status')
## Coefficients
cph.print_summary()
# Plot
cph.plot()13.3 Mixed Effects Cox Model
The Cox model can be Section 3 \[ \begin{aligned} h(t) &= h_0(t) \cdot \operatorname{exp}\left(X\beta + Zb\right)\\ b &\thicksim\operatorname{N}\left(0, \Sigma\left(\theta\right)\right) \end{aligned} \] where \(\operatorname{N}\) is a Gaussian distribution (see Section A.9) with mean \(0\) and covariance \(\Sigma(\theta)\).
Example 13.2 (Mixed Effects Cox model)
# Packages
library(survival)
library(tidyverse)
library(coxme)
# Data
## Get data
data(cancer, package = "survival")
## Transform and select data
data_f <- lung %>%
dplyr::select(-c(ph.ecog, ph.karno, pat.karno, meal.cal, wt.loss)) %>%
dplyr::mutate(sex = if_else(sex == 1, "Male", "Female")) %>%
dplyr::mutate_at(.vars = c("inst", "status", "sex"), .funs = as.factor)
# Model
# * status == 2: event, status == 1: censored
coxme_fit <- coxme::coxme(survival::Surv(time, status == 2) ~ age + sex + (1 | inst), data_f)
## Coefficients
summary(coxme_fit)
# Plot