We will explore the relationship between survival and explanatory variables by modeling. In this class, we consider a broad class of regression models: Proportional Hazards (PH) models. \[ \log\lambda(t;{\bf Z})=\log\lambda_0(t)+{\boldsymbol\beta}{\bf Z} \] Suppose \(Z = 1\) for treated subjects and \(Z = 0\) for untreated subjects. Then this model says that the hazard is increased by a factor of \(e^{\beta}\) for treated subjects versus untreated subjects (\(e^{\beta}\) might be < 1).
This group of PH models divides further into:
library(survival) library(KMsurv)
\[ \lambda(t;{\bf Z})=\lambda_0(t)\exp\{{\boldsymbol\beta}{\bf Z}\} \]
Why do we call it proportional hazards? Think of the first example, where \(Z = 1\) for treated and \(Z = 0\) for control. Then if we think of \(\lambda_1(t)\) as the hazard rate for the treated group, and \(\lambda_0(t)\) as the hazard for control, then we can write: \[ \lambda_1(t) = \lambda(t;Z=1)=\lambda_0(t)\exp\{\beta Z\} = \lambda_0(t)\exp\{\beta\} \] This implies that the ratio of the two hazards is a constant, \(\phi\), which does NOT depend on time, t. In other words, the hazards of the two groups remain proportional over time. \[ \phi = \frac{\lambda_1(t)}{\lambda_0(t)} = e^{\beta} \] \(\phi\) is referred to as the hazard ratio.
In the example of comparing two treatment groups, \(\lambda_0(t)\) is the hazard rate for the control group.
In general, \(\lambda_0(t)\) is called the baseline hazard function, and reflects the underlying hazard for subjects with all covariates \(Z_1,\ldots,Z_p\) equal to 0 (i.e., the 'reference group').
The general form is: \[ \lambda(t;{\bf Z})=\lambda_0(t)\exp\{\beta_1Z_1+\ldots+\beta_pZ_p\} \] So when we substitute all of the \(Z_j\) equal to 0, we get \(\lambda(t;{\bf Z})=\lambda_0(t)\).
In the general case, we think of the i-th individual having a set of covariates \({\bf Z}_i = (Z_{1i}, Z_{2i},\ldots, Z_{pi})\), and we model their hazard rate as some multiple of the baseline hazard rate: \[ \lambda(t;{\bf Z}_i)=\lambda_0(t)\exp\{\beta_1Z_{1i}+\ldots+\beta_pZ_{pi}\} \] This means we can write the log of the hazard ratio for the i-th individual to the reference group as: \[ \log\frac{\lambda(t;{\bf Z}_i)}{\lambda_0(t)}=\beta_1Z_{1i}+\ldots+\beta_pZ_{pi} \] The Cox Proportional Hazards model is a linear model for the log of the hazard ratio.
One of the biggest advantages of the framework of the Cox PH model is that we can estimate the parameters \({\boldsymbol\beta}\) which reflect the effects of treatment and other covariates without having to make any assumptions about the form of \(\lambda_0(t)\).
In other words, we don???t have to assume that \(\lambda_0(t)\) follows an exponential model, or a Weibull model, or any other particular parametric model. That???s what makes the model semi-parametric.
Questions:
Parameters of interest, \({\boldsymbol\beta}\), are estimated via partial likelihood. \(\lambda_0(\cdot)\) is treated as a nuisance parameter.
leuk = read.table("http://www.karlin.mff.cuni.cz/~pesta/prednasky/NMFM404/Data/leuk.dat", sep=" ", head=T, skip=7) # read data head(leuk)
## weeks remiss trtmt ## 1 6 0 1 ## 2 6 1 1 ## 3 6 1 1 ## 4 6 1 1 ## 5 7 1 1 ## 6 9 0 1
# Cox PH Model leuk.ph = coxph(Surv(weeks,remiss) ~ trtmt, data=leuk) # Note: default = Efron method for handling ties summary(leuk.ph)
## Call: ## coxph(formula = Surv(weeks, remiss) ~ trtmt, data = leuk) ## ## n= 42, number of events= 30 ## ## coef exp(coef) se(coef) z Pr(>|z|) ## trtmt -1.5721 0.2076 0.4124 -3.812 0.000138 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## exp(coef) exp(-coef) lower .95 upper .95 ## trtmt 0.2076 4.817 0.09251 0.4659 ## ## Concordance= 0.69 (se = 0.053 ) ## Rsquare= 0.322 (max possible= 0.988 ) ## Likelihood ratio test= 16.35 on 1 df, p=5.261e-05 ## Wald test = 14.53 on 1 df, p=0.0001378 ## Score (logrank) test = 17.25 on 1 df, p=3.283e-05
# Breslow handling of ties leuk.phb = coxph(Surv(weeks, remiss) ~ trtmt, data=leuk, method="breslow") summary(leuk.phb)
## Call: ## coxph(formula = Surv(weeks, remiss) ~ trtmt, data = leuk, method = "breslow") ## ## n= 42, number of events= 30 ## ## coef exp(coef) se(coef) z Pr(>|z|) ## trtmt -1.5092 0.2211 0.4096 -3.685 0.000229 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## exp(coef) exp(-coef) lower .95 upper .95 ## trtmt 0.2211 4.523 0.09907 0.4934 ## ## Concordance= 0.69 (se = 0.053 ) ## Rsquare= 0.304 (max possible= 0.989 ) ## Likelihood ratio test= 15.21 on 1 df, p=9.615e-05 ## Wald test = 13.58 on 1 df, p=0.0002288 ## Score (logrank) test = 15.93 on 1 df, p=6.571e-05
# Exact handling of ties leuk.phe = coxph(Surv(weeks, remiss) ~ trtmt, data=leuk, method="exact") summary(leuk.phe)
## Call: ## coxph(formula = Surv(weeks, remiss) ~ trtmt, data = leuk, method = "exact") ## ## n= 42, number of events= 30 ## ## coef exp(coef) se(coef) z Pr(>|z|) ## trtmt -1.6282 0.1963 0.4331 -3.759 0.00017 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## exp(coef) exp(-coef) lower .95 upper .95 ## trtmt 0.1963 5.095 0.08398 0.4587 ## ## Rsquare= 0.321 (max possible= 0.98 ) ## Likelihood ratio test= 16.25 on 1 df, p=5.544e-05 ## Wald test = 14.13 on 1 df, p=0.0001704 ## Score (logrank) test = 16.79 on 1 df, p=4.169e-05
The proportional hazards model assumes a continuous hazard ??? ties are not possible. There are four proposed modifications to the likelihood to adjust for ties.
Bottom Line: Implications of Ties (See Allison (1995), p.127-137)
Many software packages provide estimates of \({\boldsymbol\beta}\), but the hazard ratio \(HR=\exp\{{\boldsymbol\beta}\}\) is usually the parameter of interest.
For each covariate of interest, the null hypothesis is \[ H_0:\,HR_j=1\,\Leftrightarrow\,\beta_j=0 \] Wald test is used. For nested models, a likelihood ratio test is constructed.
ACTG 196 was a randomized clinical trial to study the effects of combination regimens on prevention of MAC (mycobacterium avium complex), one of the most common opportunistic infections in AIDS patients.
The treatment regimens were:
mac = read.csv("http://www.karlin.mff.cuni.cz/~pesta/prednasky/NMFM404/Data/mac.csv", head=F, col.names = c("patid", "age", "agecat", "sex", "karnof", "ivdrug", "antiret", "cd4", "cd4cat", "ctg", "dthstat", "dthtime", "macstat", "mactime", "swdrstat", "swdrtime", "rif", "clari", "toxstat", "toxtime")) # mactime = time to mac # macstat = event indicator head(mac) # Check out a few data rows:
## patid age agecat sex karnof ivdrug antiret cd4 cd4cat ctg dthstat ## 1 1 42 1 0 90 0 1 8 0 1 0 ## 2 2 33 0 0 90 0 0 30 1 1 0 ## 3 3 39 1 0 100 0 1 80 1 1 1 ## 4 4 35 0 0 80 0 0 58 1 1 0 ## 5 5 42 1 0 90 0 0 59 1 1 0 ## 6 6 45 1 0 90 0 0 18 0 1 1 ## dthtime macstat mactime swdrstat swdrtime rif clari toxstat toxtime ## 1 623 1 560 1 106 1 0 0 623 ## 2 651 0 651 1 86 1 0 0 651 ## 3 464 0 26 0 26 1 0 0 26 ## 4 622 0 622 1 111 0 1 0 501 ## 5 643 0 643 1 119 0 1 0 643 ## 6 216 0 171 0 171 0 1 1 171
# Treatment arms: # rif + clari (N=389) # clari (N=397) # rif (N=391) table(mac$clari, mac$rif)
## ## 0 1 ## 0 389 391 ## 1 397 0
# Fit and plot KM survival curves fit.km = survfit(Surv(mactime, macstat) ~ clari + rif, data=mac)
plot(fit.km, mark.time=F, lty=1:3, xlab="Days", ylab="MAC-Free Survival") legend(lty=1:3, x=100, y=.4, legend=c("Rif + Clari", "Rifabutin", "Clarithromycin"), cex=0.8)
Model 1
# Fit Cox PH model fit.mac1 = coxph(Surv(mactime,macstat) ~ karnof + clari + rif, data=mac) summary(fit.mac1)
## Call: ## coxph(formula = Surv(mactime, macstat) ~ karnof + clari + rif, ## data = mac) ## ## n= 1177, number of events= 121 ## ## coef exp(coef) se(coef) z Pr(>|z|) ## karnof -0.04485 0.95614 0.01064 -4.217 2.48e-05 *** ## clari 0.27557 1.31728 0.25801 1.068 0.285509 ## rif 0.87197 2.39161 0.23694 3.680 0.000233 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## exp(coef) exp(-coef) lower .95 upper .95 ## karnof 0.9561 1.0459 0.9364 0.9763 ## clari 1.3173 0.7591 0.7944 2.1842 ## rif 2.3916 0.4181 1.5032 3.8051 ## ## Concordance= 0.649 (se = 0.028 ) ## Rsquare= 0.027 (max possible= 0.73 ) ## Likelihood ratio test= 32.02 on 3 df, p=5.193e-07 ## Wald test = 32.29 on 3 df, p=4.548e-07 ## Score (logrank) test = 33.16 on 3 df, p=2.977e-07
Model 2
# Fit Cox PH model adding baseline CD4 variable fit.mac2 = coxph(Surv(mactime,macstat) ~ karnof + clari + rif + cd4, data=mac) summary(fit.mac2)
## Call: ## coxph(formula = Surv(mactime, macstat) ~ karnof + clari + rif + ## cd4, data = mac) ## ## n= 1177, number of events= 121 ## ## coef exp(coef) se(coef) z Pr(>|z|) ## karnof -0.036874 0.963798 0.010665 -3.457 0.000546 *** ## clari 0.252345 1.287041 0.258337 0.977 0.328664 ## rif 0.879749 2.410294 0.237092 3.711 0.000207 *** ## cd4 -0.018360 0.981807 0.003684 -4.984 6.23e-07 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## exp(coef) exp(-coef) lower .95 upper .95 ## karnof 0.9638 1.0376 0.9439 0.9842 ## clari 1.2870 0.7770 0.7757 2.1354 ## rif 2.4103 0.4149 1.5145 3.8360 ## cd4 0.9818 1.0185 0.9747 0.9889 ## ## Concordance= 0.716 (se = 0.028 ) ## Rsquare= 0.053 (max possible= 0.73 ) ## Likelihood ratio test= 63.77 on 4 df, p=4.682e-13 ## Wald test = 55.59 on 4 df, p=2.449e-11 ## Score (logrank) test = 56.22 on 4 df, p=1.806e-11
We can also compute the hazard ratio ourselves, by exponentiating the \(\beta\) coefficients: \[ HR_{cd4} = \exp\{???0.01835\} = 0.98 \] Why is this HR so close to 1, and yet still highly significant? What is the interpretation of this HR?
The likelihood ratio test for the effect of CD4 is twice the difference in minus log-likelihoods between the two models (on 1 df): \[ ??^2{LR} = 2???(63.8???32.0)=31.8 \] How does this test statistic compare to the Wald \(\chi^2\) test?
In the mac study, there were three treatment arms (rif
, clari
, and the rif+clari
combination (for rif=clari=0
). The combination therapy is the reference group, so the coefficients of rif
and clari
are in comparison to the combination group.
How do we compare the Rifa arm against the Clari arm? That is equivalent to testing \[ H_0:\,\beta_{clari}=\beta_{rifa} \] For that, we need the matrix \(\mathsf{var}\widehat{\boldsymbol\beta}\).
names(fit.mac2) # check components of fit object
## [1] "coefficients" "var" "loglik" ## [4] "score" "iter" "linear.predictors" ## [7] "residuals" "means" "concordance" ## [10] "method" "n" "nevent" ## [13] "terms" "assign" "wald.test" ## [16] "y" "formula" "call"
beta_hat = fit.mac2$coef Var_beta = fit.mac2$var # Var(beta_hat) A = c(0,1,-1,0) # contrast vector beta_dif = t(A) %*% beta_hat # beta_clari - beta_rif var_dif = t(A) %*% Var_beta %*% A # Var(beta_clari-beta_rif) se_dif = sqrt(var_dif) # SE(beta_clari-beta_rif) t_dif = beta_dif/se_dif t_dif # Wald test t-statistic, df=n-p
## [,1] ## [1,] -2.959871
2* pt( -abs(t_dif), df=1177-4) # p-value for difference in arms
## [,1] ## [1,] 0.003139506
Testing Model 2 for overall treatment effect Two ways:
A = matrix(0, ncol=4, nrow=2) A[1,2] = A[2,3] = 1 beta_trt = A %% beta_hat var_trt = A %% Var_beta %% t(A) (Wald_trt = t(beta_trt) %% solve(var_trt) %*% beta_trt)
## [,1] ## [1,] 17.00252
1-pchisq(Wald_trt, df=2) # p-value for Wald test for treatment
## [,1] ## [1,] 0.0002032125
rif
, clari
and compare log-likelihood, against Chi-square with df=2.
fit.mac3 = update(fit.mac2, .~. -rif -clari) anova(fit.mac3, fit.mac2, test="Chisq")
## Analysis of Deviance Table ## Cox model: response is Surv(mactime, macstat) ## Model 1: ~ karnof + cd4 ## Model 2: ~ karnof + clari + rif + cd4 ## loglik Chisq Df P(>|Chi|)
## 1 -747.07
## 2 -738.62 16.914 2 0.0002125 *** ## --- ## Signif. codes: 0 '**' 0.001 '' 0.01 '' 0.05 '.' 0.1 ' ' 1
For the i-th patient with covariates \({\bf Z}_i\), we have: \[ S_i(t)=[S_0(t)]^{\exp\{{\boldsymbol\beta}{\bf Z}_i\}} \] Say we are interested in the survival pattern for single males in the nursing home study. Based on the previous formula, if we had an estimate for the survival function in the reference group, i.e., \(\widehat{S}_0(t)\), we could get estimates of the survival function for any set of covariates \({\bf Z}_i\).
How can we estimate the survival function, \(S_0(t)\)? We could use the KM estimator, but there are a few disadvantages of that approach:
Instead, we will use a baseline hazard estimator which takes advantage of the proportional hazards assumption to get a smoother estimate. \[ \widehat{S}_i(t)=[\widehat{S}_0(t)]^{\exp\{\widehat{\boldsymbol\beta}{\bf Z}_i\}} \] Using the above formula, we substitute \(\widehat{\boldsymbol\beta}\) based on fitting the Cox PH model, and calculate \(\widehat{S}_0(t)\) by one of the following approaches:
R
) ... extending the concept of the Nelson-Aalen estimator to the proportional hazards modelR
, SAS
) ... analogous to the Kaplan-Meier estimatorConsider the Nursing Home study. We fit the model
\[
\lambda_(t) = \lambda_0(t) \exp\{\beta_1[\mbox{married}] + \beta_2[\mbox{health status}]\}
\]
Let's predict the survival for single individuals with health status = 5
(worst), and for those with health status = 4
(second worst).
nurshome = read.table("http://www.karlin.mff.cuni.cz/~pesta/prednasky/NMFM404/Data/NursingHome.dat", head=F, skip=14, col.names = c("los", "age", "rx", "gender", "married", "health", "censor")) nurshome$discharged = 1 - nurshome$censor # event indicator head(nurshome)
## los age rx gender married health censor discharged ## 1 37 86 1 0 0 2 0 1 ## 2 61 77 1 0 0 4 0 1 ## 3 1084 75 1 0 0 3 1 0 ## 4 1092 77 1 0 1 2 1 0 ## 5 23 86 1 0 0 4 0 1 ## 6 1091 71 1 1 0 3 1 0
table(nurshome$health)
## ## 2 3 4 5 ## 343 576 513 169
# Fit Cox PH model fit.nurs = coxph(Surv(los,discharged) ~ married + health, data=nurshome) # Predict survival for single patients with health = 5 newdat = data.frame(married=0, health=5) newdat # data frame with same predictors as fit.nurs
## married health ## 1 0 5
ps5 = survfit(fit.nurs, newdata=newdat) # predicted survival # Compare with Kaplan-Meier nursh5 = nurshome[nurshome$health==5 & nurshome$married==0,] fit.km5 = survfit(Surv(los, discharged) ~ 1, data = nursh5) # Predict survival for single patients, health =4 newdat[1,] = c(0,4) ps4 = survfit(fit.nurs, newdata=newdat) # Compare with Kaplan-Meier nursh4 = nurshome[nurshome$health==4 & nurshome$married==0,] fit.km4 = survfit(Surv(los, discharged) ~ 1, data = nursh4)
plot(ps5, xlab= "Length of stay in nursing home (days)", ylab = "Proportion not discharged", col=2, conf.int=F) lines(fit.km5, mark.time=F, col=2) # add line to existing plot lines(ps4, xlab= "Length of stay in nursing home (days)", ylab = "Proportion not discharged", col=4, conf.int=F) lines(fit.km4, mark.time=F, col=4) # add line to existing plot
Continuing the Nursing Home example, let's get and plot the baseline survival \(S_0(t)\) and cumulative hazard \(\Lambda_0(t)\). They correspond to married=0
, health=0
.
newdat[1,] = c(0,0) ps0 = survfit(fit.nurs, newdata=newdat) # S_0 = ps0$surv bh = basehaz(fit.nurs, centered=F) Lambda0 = bh$hazard # it???s *cummulative* hazard! # same as Lambda0 = -log(ps0$surv)
plot(bh$time, Lambda0, type="l", xlab="Length of stay in nursing home (days)", ylab="Cummulative hazard", col=4)
Collett (Section 3.6) has an excellent discussion of various approaches for model selection. In practice, model selection proceeds through a combination of
Many advocate the approach of first doing a univariate analysis to 'screen' out potentially significant variables for consideration in the multivariate model (see Collett).
Moreover, typically univariate analysis is a part of a larger analysis, in order to identify the importance of each predictor taken in itself.
One conservation measure suggested for trawl fishing is a minimum size limit for halibut (32 inches). However, this size limit would only be effective if captured fish below the limit survived until the time of their release. An experiment was conducted to evaluate the survival rates of halibut caught by trawls or longlines, and to assess other factors which might contribute to survival (duration of trawling, maximum depth fished, size of fish, and handling time).
halibut = read.csv("http://www.karlin.mff.cuni.cz/~pesta/prednasky/NMFM404/Data/halibut.csv", head=T) dim(halibut)
## [1] 294 8
head(halibut)
## No Time Death Towd Deldepth Length Handtime Logcat ## 1 1 209 1 30 13 41 8 6.992 ## 2 2 209 1 30 13 44 8 6.992 ## 3 3 209 1 30 13 47 10 6.992 ## 4 4 209 1 30 13 34 10 6.992 ## 5 5 38 1 30 13 40 11 6.992 ## 6 6 209 1 30 13 42 11 6.992
# Time = Survival time, in minutes # Death = Event indicator # Length = length of fish (cm) # Handtime = handling time (min) # Logcat = log total catch
### Univariate analysis # Towing duration: table(halibut$Towd)
## ## 30 100 ## 107 187
kmtow = survfit(Surv(Time, Death) ~ Towd, data=halibut) # Length of fish: summary(halibut$Length)
## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 29.00 38.00 43.00 42.51 46.75 57.00
halibut$Lencat = (halibut$Length > 42) + 0 table(halibut$Lencat)
## ## 0 1 ## 134 160
kmlen = survfit(Surv(Time, Death) ~ Lencat, data=halibut) # Depth range of towing: table(halibut$Deldepth)
## ## 1 2 3 4 5 6 8 10 13 15 23 49 58 ## 6 17 8 6 45 46 40 68 8 12 32 3 3
summary(halibut$Deldepth)
## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 1.000 5.000 8.000 9.966 10.000 58.000
halibut$Depthcat = cut(halibut$Deldepth, breaks=c(0,5,9,11,60)) kmdep = survfit(Surv(Time, Death) ~ Depthcat, data=halibut) # Handling time: summary(halibut$Handtime)
## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 1.00 6.00 10.00 12.51 17.00 38.00
halibut$Handcat = cut(halibut$Handtime, breaks = c(0,6,10,17,40)) kmhan = survfit(Surv(Time, Death) ~ Handcat, data=halibut) # Log total catch: summary(halibut$Logcat)
## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 2.904 4.059 4.203 4.886 5.323 8.690
halibut$Catchcat = cut(halibut$Logcat, breaks=c(0,4.06,4.89,5.33,9)) kmlgc = survfit(Surv(Time, Death) ~ Catchcat, data = halibut)
par(mfrow=c(3,2)) plot(kmtow, lty=1:2, col=2:3, xlab="Minutes", ylab="Survival", main="Towing Duration") plot(kmlen, lty=1:2, col=2:3, xlab="Minutes", ylab="Survival", main="Fish Length") plot(kmdep, lty=1:4, col=1:4, xlab="Minutes", ylab="Survival", main="Depth of Towing") plot(kmhan, lty=1:4, col=1:4, xlab="Minutes", ylab="Survival", main="Handling Time") plot(kmlgc, lty=1:4, col=1:4, xlab="Minutes", ylab="Survival", main="Total Catch") par(mfrow=c(1,1))
Collett recommends using a likelihood ratio test for all variable inclusion/exclusion decisions.
Each step uses a 'greedy' approach:
library(MASS) args(stepAIC)
## function (object, scope, scale = 0, direction = c("both", "backward", ## "forward"), trace = 1, keep = NULL, steps = 1000, use.start = FALSE, ## k = 2, ...) ## NULL
# Model selection for Halibut data fit = coxph(Surv(Time, Death) ~ Towd + Length + Deldepth + Handtime + Logcat, data=halibut) # Backward selection using AIC fitb = stepAIC(fit, direction="backward", k=2)
## Start: AIC=2520.15 ## Surv(Time, Death) ~ Towd + Length + Deldepth + Handtime + Logcat ## ## Df AIC ## - Deldepth 1 2519.7 ## <none> 2520.2 ## - Towd 1 2531.7 ## - Logcat 1 2532.8 ## - Length 1 2533.0 ## - Handtime 1 2544.9 ## ## Step: AIC=2519.7 ## Surv(Time, Death) ~ Towd + Length + Handtime + Logcat ## ## Df AIC ## <none> 2519.7 ## - Logcat 1 2530.8 ## - Length 1 2531.2 ## - Towd 1 2532.7 ## - Handtime 1 2547.7
summary(fitb)
## Call: ## coxph(formula = Surv(Time, Death) ~ Towd + Length + Handtime + ## Logcat, data = halibut) ## ## n= 294, number of events= 273 ## ## coef exp(coef) se(coef) z Pr(>|z|) ## Towd 0.007744 1.007774 0.002017 3.839 0.000123 *** ## Length -0.036960 0.963715 0.010028 -3.686 0.000228 *** ## Handtime 0.054947 1.056485 0.009870 5.567 2.59e-08 *** ## Logcat -0.183373 0.832458 0.050982 -3.597 0.000322 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## exp(coef) exp(-coef) lower .95 upper .95 ## Towd 1.0078 0.9923 1.0038 1.0118 ## Length 0.9637 1.0377 0.9450 0.9828 ## Handtime 1.0565 0.9465 1.0362 1.0771 ## Logcat 0.8325 1.2013 0.7533 0.9199 ## ## Concordance= 0.683 (se = 0.02 ) ## Rsquare= 0.25 (max possible= 1 ) ## Likelihood ratio test= 84.5 on 4 df, p=0 ## Wald test = 90.71 on 4 df, p=0 ## Score (logrank) test = 94.51 on 4 df, p=0
# Forward selection using AIC fit0 = coxph(Surv(Time, Death) ~ 1, data=halibut) # (starting model) fitf = stepAIC(fit0, scope=formula(fit), direction="forward", k=2)
## Warning in is.na(fit$coefficients): is.na() applied to non-(list or ## vector) of type 'NULL'
## Start: AIC=2596.2 ## Surv(Time, Death) ~ 1
## Warning in is.na(fit$coefficients): is.na() applied to non-(list or ## vector) of type 'NULL'
## Warning in is.na(fit$coefficients): is.na() applied to non-(list or ## vector) of type 'NULL'
## Df AIC ## + Handtime 1 2556.8 ## + Towd 1 2568.0 ## + Length 1 2586.1 ## + Deldepth 1 2594.5 ## <none> 2596.2 ## + Logcat 1 2596.8 ## ## Step: AIC=2556.85 ## Surv(Time, Death) ~ Handtime ## ## Df AIC ## + Logcat 1 2540.3 ## + Towd 1 2543.6 ## + Length 1 2549.9 ## <none> 2556.8 ## + Deldepth 1 2558.8 ## ## Step: AIC=2540.3 ## Surv(Time, Death) ~ Handtime + Logcat ## ## Df AIC ## + Towd 1 2531.2 ## + Length 1 2532.7 ## <none> 2540.3 ## + Deldepth 1 2541.4 ## ## Step: AIC=2531.23 ## Surv(Time, Death) ~ Handtime + Logcat + Towd ## ## Df AIC ## + Length 1 2519.7 ## <none> 2531.2 ## + Deldepth 1 2533.0 ## ## Step: AIC=2519.7 ## Surv(Time, Death) ~ Handtime + Logcat + Towd + Length ## ## Df AIC ## <none> 2519.7 ## + Deldepth 1 2520.2
summary(fitf)
## Call: ## coxph(formula = Surv(Time, Death) ~ Handtime + Logcat + Towd + ## Length, data = halibut) ## ## n= 294, number of events= 273 ## ## coef exp(coef) se(coef) z Pr(>|z|) ## Handtime 0.054947 1.056485 0.009870 5.567 2.59e-08 *** ## Logcat -0.183373 0.832458 0.050982 -3.597 0.000322 *** ## Towd 0.007744 1.007774 0.002017 3.839 0.000123 *** ## Length -0.036960 0.963715 0.010028 -3.686 0.000228 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## exp(coef) exp(-coef) lower .95 upper .95 ## Handtime 1.0565 0.9465 1.0362 1.0771 ## Logcat 0.8325 1.2013 0.7533 0.9199 ## Towd 1.0078 0.9923 1.0038 1.0118 ## Length 0.9637 1.0377 0.9450 0.9828 ## ## Concordance= 0.683 (se = 0.02 ) ## Rsquare= 0.25 (max possible= 1 ) ## Likelihood ratio test= 84.5 on 4 df, p=0 ## Wald test = 90.71 on 4 df, p=0 ## Score (logrank) test = 94.51 on 4 df, p=0
# same final model as backward selection! # Stepwise model selection (backward and forward) fits = stepAIC(fit, direction="both", k=2)
## Start: AIC=2520.15 ## Surv(Time, Death) ~ Towd + Length + Deldepth + Handtime + Logcat ## ## Df AIC ## - Deldepth 1 2519.7 ## <none> 2520.2 ## - Towd 1 2531.7 ## - Logcat 1 2532.8 ## - Length 1 2533.0 ## - Handtime 1 2544.9 ## ## Step: AIC=2519.7 ## Surv(Time, Death) ~ Towd + Length + Handtime + Logcat ## ## Df AIC ## <none> 2519.7 ## + Deldepth 1 2520.2 ## - Logcat 1 2530.8 ## - Length 1 2531.2 ## - Towd 1 2532.7 ## - Handtime 1 2547.7
summary(fits)
## Call: ## coxph(formula = Surv(Time, Death) ~ Towd + Length + Handtime + ## Logcat, data = halibut) ## ## n= 294, number of events= 273 ## ## coef exp(coef) se(coef) z Pr(>|z|) ## Towd 0.007744 1.007774 0.002017 3.839 0.000123 *** ## Length -0.036960 0.963715 0.010028 -3.686 0.000228 *** ## Handtime 0.054947 1.056485 0.009870 5.567 2.59e-08 *** ## Logcat -0.183373 0.832458 0.050982 -3.597 0.000322 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## exp(coef) exp(-coef) lower .95 upper .95 ## Towd 1.0078 0.9923 1.0038 1.0118 ## Length 0.9637 1.0377 0.9450 0.9828 ## Handtime 1.0565 0.9465 1.0362 1.0771 ## Logcat 0.8325 1.2013 0.7533 0.9199 ## ## Concordance= 0.683 (se = 0.02 ) ## Rsquare= 0.25 (max possible= 1 ) ## Likelihood ratio test= 84.5 on 4 df, p=0 ## Wald test = 90.71 on 4 df, p=0 ## Score (logrank) test = 94.51 on 4 df, p=0
# Stepwise selection with alpha=k=3 fitk3 = stepAIC(fit, direction="both", k=3)
## Start: AIC=2525.15 ## Surv(Time, Death) ~ Towd + Length + Deldepth + Handtime + Logcat ## ## Df AIC ## - Deldepth 1 2523.7 ## <none> 2525.2 ## - Towd 1 2535.7 ## - Logcat 1 2536.8 ## - Length 1 2537.0 ## - Handtime 1 2548.9 ## ## Step: AIC=2523.7 ## Surv(Time, Death) ~ Towd + Length + Handtime + Logcat ## ## Df AIC ## <none> 2523.7 ## + Deldepth 1 2525.2 ## - Logcat 1 2533.8 ## - Length 1 2534.2 ## - Towd 1 2535.7 ## - Handtime 1 2550.7
summary(fitk3)
## Call: ## coxph(formula = Surv(Time, Death) ~ Towd + Length + Handtime + ## Logcat, data = halibut) ## ## n= 294, number of events= 273 ## ## coef exp(coef) se(coef) z Pr(>|z|) ## Towd 0.007744 1.007774 0.002017 3.839 0.000123 *** ## Length -0.036960 0.963715 0.010028 -3.686 0.000228 *** ## Handtime 0.054947 1.056485 0.009870 5.567 2.59e-08 *** ## Logcat -0.183373 0.832458 0.050982 -3.597 0.000322 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## exp(coef) exp(-coef) lower .95 upper .95 ## Towd 1.0078 0.9923 1.0038 1.0118 ## Length 0.9637 1.0377 0.9450 0.9828 ## Handtime 1.0565 0.9465 1.0362 1.0771 ## Logcat 0.8325 1.2013 0.7533 0.9199 ## ## Concordance= 0.683 (se = 0.02 ) ## Rsquare= 0.25 (max possible= 1 ) ## Likelihood ratio test= 84.5 on 4 df, p=0 ## Wald test = 90.71 on 4 df, p=0 ## Score (logrank) test = 94.51 on 4 df, p=0
Notes:
stepAIC
, these are included in the scope part.How do we know if the model fits well?
# Analysis of halibut data, continued # 1. Compute cummulative hazard Lambda_0: Lambda0 = basehaz(fitb, centered=F) Lambda0[1:3,]
## hazard time ## 1 0.009323417 1.1 ## 2 0.018739005 1.2 ## 3 0.057162782 1.3
# 2. Compute Lambda_0(X_i) for all i: predLambda0 = function(Lambda0, t) { u = length(Lambda0$time[Lambda0$time0, Lambda0$hazard[u], 0) ) } n = length(halibut$Time) L0 = rep(NA, n) # for storing Lambda_0(X_i) for(i in 1:n) L0[i] = predLambda0(Lambda0, halibut$Time[i]) # 3. Compute exp(beta Z_i) ebetaZ = exp(as.matrix(halibut[,c(4,6,7,8)]) %*% fitb$coef) # 4. Compute e_i for all i: e = L0 * ebetaZ # 5. Run KM for (e, delta) resfit = survfit(Surv(e,halibut$Death) ~ 1)
# 6.1 Plot log survival function against time plot(resfit$time, log(resfit$surv), type="l")
# 6.2 Plot log-log surv against log time plot(log(resfit$time), log(-log(resfit$surv)), type="l", xlab = "log time", ylab = "log-log Survival", main = "Halibut: Cox-Snell residuals")
# Halibut data, continued # Calculates martingale, deviance, score or Schoenfeld residuals for a Cox proportional hazards model. martres = residuals(fitb, type="martingale") # Martingale residuals martres[1:3]
## 1 2 3 ## -0.2384461 -0.1084664 -0.1073743
boxplot(martres ~ halibut$Towd, xlab="Towing duration", ylab="Martingale Residuals", boxwex=0.4) abline(h=0, lty=2)
plot(halibut$Length, martres, xlab="Halibut Length", ylab="Martingale Residuals", col=2) abline(h=0, lty=2)
plot(halibut$Handtime, martres, xlab="Handling Time", ylab="Martingale Residuals", col=3) abline(h=0, lty=2)
plot(halibut$Logcat, martres, xlab="Log Catch Size", ylab="Martingale Residuals", col=4) abline(h=0, lty=2)
plot(fitb$linear.predictors, martres, xlab="Linear Predictor", ylab="Martingale Residuals", col=6) abline(h=0, lty=2)
# Halibut data, continued devres = residuals(fitb, type="deviance") # Deviance residuals devres[1:3]
## 1 2 3 ## -0.2217595 -0.1047756 -0.1037553
boxplot(devres ~ halibut$Towd, xlab="Towing duration", ylab="Deviance Residuals", boxwex=0.4) abline(h=0, lty=2)
plot(halibut$Length, devres, xlab="Halibut Length", ylab="Deviance Residuals", col=2) abline(h=0, lty=2)
plot(halibut$Handtime, devres, xlab="Handling Time", ylab="Deviance Residuals", col=3) abline(h=0, lty=2)
plot(halibut$Logcat, devres, xlab="Log Catch Size", ylab="Deviance Residuals", col=4) abline(h=0, lty=2)
plot(fitb$linear.predictors, devres, xlab="Linear Predictor", ylab="Deviance Residuals", col=6) abline(h=0, lty=2)
# Halibut data, continued schoeres = residuals(fitb, type="schoenfeld") # Schoenfeld residuals schoeres[1:3,]
## Towd Length Handtime Logcat ## 1.1 13.59661 -10.584914 4.165367 -0.7233896 ## 1.2 13.73103 -6.689555 3.206545 -0.7305410 ## 1.3 14.00863 1.193347 2.279487 -0.7453105
evtime = sort(halibut$Time[halibut$Death==1]) # event times: 1.1,1.2,1.3 etc plot(log(evtime), schoeres[,"Towd"], xlab="Log Survival time", ylab="Schoenfeld Residuals, Towing Duration") abline(h=0, lty=2)
plot(log(evtime), schoeres[,"Length"], xlab="Log Survival time", ylab="Schoenfeld Residuals, Fish Length", col=2) abline(h=0, lty=2)
plot(log(evtime), schoeres[,"Handtime"], xlab="Log Survival time", ylab="Schoenfeld Res, Handling Time", col=3) abline(h=0, lty=2)
plot(log(evtime), schoeres[,"Logcat"], xlab="Log Survival time", ylab="Schoenfeld Res, Log Catch Size", col=4) abline(h=0, lty=2)
# Halibut data, continued scschres = residuals(fitb, type="scaledsch") # Schoenfeld residuals scschres[1:3,]
## [,1] [,2] [,3] [,4] ## 1.1 0.02160255 -0.3136602583 0.1220311 -0.8067589 ## 1.2 0.02019807 -0.2117188889 0.1166763 -0.7854618 ## 1.3 0.01532368 -0.0002137067 0.1327947 -0.7959172
evtime = sort(halibut$Time[halibut$Death==1]) # event times: 1.1,1.2,1.3 etc plot(log(evtime), scschres[,1], xlab="Log Survival time", ylab="Scaled Sch Residuals, Towing Duration") abline(h=0, lty=2)
plot(log(evtime), scschres[,2], xlab="Log Survival time", ylab="Scaled Sch Residuals, Fish Length", col=2) abline(h=0, lty=2)
plot(log(evtime), scschres[,3], xlab="Log Survival time", ylab="Scaled Sch Res, Handling Time", col=3) abline(h=0, lty=2)
plot(log(evtime), scschres[,4], xlab="Log Survival time", ylab="Scaled Sch Res, Log Catch Size", col=4) abline(h=0, lty=2)
If you calculate martingale or deviance residuals without any covariates in the model and then plot them against covariates, you obtain a graphical impression of the relationship between the covariate and the hazard.
fit = coxph(Surv(Time, Death) ~ 1, data=halibut) martres = residuals(fit, type="martingale") plot(halibut$Deldepth, martres, xlab="Depth Range", ylab="Martingale Residuals") lines(lowess(halibut$Deldepth, martres), col=2)
plot(halibut$Length, martres, xlab="Fish Length", ylab= "Martingale Residuals") lines(lowess(halibut$Length, martres), col=2)
plot(halibut$Handtime, martres, xlab="Handling Time", ylab= "Martingale Residuals") lines(lowess(halibut$Handtime, martres), col=2)
plot(halibut$Logcat, martres, xlab="Log Catch Size", ylab= "Martingale Residuals") lines(lowess(halibut$Logcat, martres), col=2)
There are several options for checking the assumption of pro- portional hazards:
R
via cox.zph
.How do we interpret the above? Kleinbaum (and other texts) suggest a strategy of assuming that PH holds unless there is very strong evidence to counter this assumption:
If PH doesn't exactly hold for a particular covariate but we fit the PH model anyway, then what we are getting is sort of an average HR, averaged over the event times.
In most cases, this is not such a bad estimate. Allison claims that too much emphasis is put on testing the PH assumption, and not enough to other important aspects of the model.
Example: Nursing Home - gender and marital status
# Fit KM curves for men and women separately fitgen = survfit(Surv(los, discharged) ~ gender, data=nurshome) names(fitgen)
## [1] "n" "time" "n.risk" "n.event" "n.censor" ## [6] "surv" "type" "strata" "std.err" "upper" ## [11] "lower" "conf.type" "conf.int" "call"
fitgen$time
## [1] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 ## [15] 14 15 16 17 18 19 20 21 22 23 24 25 26 27 ## [29] 28 29 30 31 32 33 34 35 36 37 38 39 40 41 ## [43] 42 43 44 45 46 47 48 49 50 51 52 53 54 55 ## [57] 56 57 59 60 61 62 63 64 66 67 68 69 70 71 ## [71] 72 74 75 76 77 78 80 81 82 83 84 85 86 88 ## [85] 89 90 91 92 93 94 95 96 97 98 99 100 101 102 ## [99] 104 105 106 107 108 109 110 111 113 114 115 116 118 119 ## [113] 120 121 122 123 125 126 127 128 129 130 131 132 133 134 ## [127] 135 137 139 140 141 142 143 144 146 147 148 149 150 152 ## [141] 153 155 156 159 160 161 162 163 164 165 166 168 169 170 ## [155] 172 174 175 176 178 180 182 184 185 187 189 190 191 192 ## [169] 193 194 195 196 197 198 199 201 202 203 204 205 207 208 ## [183] 209 211 214 215 217 218 221 222 223 224 225 226 227 230 ## [197] 231 232 233 234 237 238 240 241 243 244 250 252 253 256 ## [211] 258 259 261 262 263 265 268 269 270 271 273 274 276 277 ## [225] 279 281 282 283 285 288 290 291 293 295 296 297 298 301 ## [239] 302 305 306 307 309 310 311 312 314 315 317 318 322 326 ## [253] 328 330 332 337 340 342 344 350 351 352 355 358 360 362 ## [267] 363 364 365 366 367 369 370 372 373 374 375 376 377 378 ## [281] 379 380 381 382 383 384 386 387 388 389 390 393 394 395 ## [295] 396 397 401 402 403 405 407 408 409 410 411 413 415 416 ## [309] 417 418 420 421 422 424 425 428 429 430 433 434 435 437 ## [323] 439 442 444 446 449 450 451 452 453 456 457 461 462 463 ## [337] 465 466 467 470 471 472 474 475 476 477 478 480 481 483 ## [351] 487 488 489 492 494 495 497 498 500 505 506 507 508 512 ## [365] 514 519 521 522 525 526 527 530 533 537 540 545 547 549 ## [379] 553 554 556 561 562 563 569 570 572 576 579 584 585 589 ## [393] 592 597 598 599 605 611 613 617 618 619 620 624 625 626 ## [407] 627 628 634 635 639 642 645 648 653 654 660 661 664 665 ## [421] 669 670 673 676 679 680 681 683 684 686 687 690 691 695 ## [435] 696 697 698 702 707 708 709 710 711 717 719 721 722 724 ## [449] 726 732 733 736 741 747 749 754 756 757 758 759 770 774 ## [463] 775 780 783 788 790 792 803 810 811 812 817 818 823 832 ## [477] 833 838 851 852 853 858 864 865 866 870 875 879 881 895 ## [491] 896 900 903 906 909 911 913 921 924 929 936 941 942 945 ## [505] 962 963 964 965 970 982 986 992 993 994 999 1005 1006 1008 ## [519] 1010 1012 1022 1026 1028 1033 1034 1041 1043 1047 1053 1054 1055 1057 ## [533] 1062 1074 1084 1088 1092 0 1 2 3 4 5 6 7 8 ## [547] 9 10 11 12 13 14 15 16 17 18 19 20 21 22 ## [561] 23 24 25 26 27 29 30 31 32 33 34 35 37 38 ## [575] 39 40 41 42 43 44 45 46 47 48 49 50 51 54 ## [589] 55 57 58 60 61 62 63 65 66 68 69 71 72 73 ## [603] 76 77 80 81 82 83 84 85 86 88 89 90 91 92 ## [617] 95 96 97 100 103 105 106 108 109 112 116 118 119 120 ## [631] 121 122 123 126 131 133 138 143 144 146 147 148 151 152 ## [645] 153 156 159 160 162 164 165 167 169 170 171 174 176 178 ## [659] 182 185 189 190 193 195 202 205 206 207 212 216 227 234 ## [673] 237 241 250 260 262 267 269 270 273 274 275 276 277 278 ## [687] 279 290 294 297 302 303 306 310 326 335 365 370 374 375 ## [701] 381 389 393 396 397 399 400 404 408 412 416 448 449 456 ## [715] 463 465 467 476 487 492 495 501 541 548 561 565 568 574 ## [729] 578 582 584 586 597 599 602 605 607 620 624 638 646 649 ## [743] 651 663 669 674 709 725 738 740 743 752 756 773 797 816 ## [757] 864 881 899 934 938 948 971 973 985 1060 1074 1091
fitgen$surv
## [1] 0.99575552 0.97962649 0.97113752 0.96859083 0.95925297 0.94906621 ## [7] 0.93972835 0.92954160 0.92359932 0.91341256 0.90747029 0.89983022 ## [13] 0.89303905 0.88285229 0.87351443 0.86332767 0.85483871 0.85059423 ## [19] 0.84295416 0.83786078 0.83276740 0.82258065 0.81578947 0.80730051 ## [25] 0.80050934 0.79286927 0.79032258 0.78438031 0.77843803 0.77079796 ## [31] 0.76740238 0.76570458 0.75976231 0.75551783 0.75127334 0.74702886 ## [37] 0.74617997 0.74023769 0.73769100 0.73344652 0.73005093 0.72495756 ## [43] 0.72241087 0.71816638 0.71222411 0.70797963 0.70373514 0.69779287 ## [49] 0.69185059 0.69100170 0.69015280 0.68675722 0.68505942 0.68251273 ## [55] 0.67741935 0.67487267 0.66977929 0.66638370 0.66298812 0.66044143 ## [61] 0.65619694 0.65025467 0.64601019 0.64346350 0.64091681 0.64006791 ## [67] 0.63837012 0.63497453 0.63327674 0.62818336 0.62563667 0.62139219 ## [73] 0.61884550 0.61460102 0.61290323 0.61035654 0.60950764 0.60696095 ## [79] 0.60441426 0.60271647 0.59847199 0.59592530 0.59422750 0.59337861 ## [85] 0.58998302 0.58913413 0.58488964 0.58064516 0.57979626 0.57640068 ## [91] 0.57470289 0.56876061 0.56791171 0.56536503 0.56451613 0.56112054 ## [97] 0.56027165 0.55772496 0.55687606 0.55432937 0.55348048 0.55263158 ## [103] 0.54838710 0.54753820 0.54584041 0.54414261 0.54244482 0.54074703 ## [109] 0.53735144 0.53650255 0.53480475 0.53395586 0.53225806 0.53140917 ## [115] 0.52886248 0.52546689 0.52376910 0.52292020 0.52122241 0.51867572 ## [121] 0.51697793 0.51443124 0.51358234 0.51188455 0.51018676 0.50764007 ## [127] 0.50679117 0.50509338 0.50254669 0.50169779 0.50084890 0.50000000 ## [133] 0.49830221 0.49405772 0.49235993 0.49151104 0.48811545 0.48556876 ## [139] 0.48471986 0.48387097 0.48217317 0.47962649 0.47707980 0.47538200 ## [145] 0.47453311 0.47198642 0.46943973 0.46859083 0.46519525 0.46434635 ## [151] 0.46349745 0.45925297 0.45840407 0.45755518 0.45415959 0.45246180 ## [157] 0.45161290 0.45076401 0.44991511 0.44821732 0.44736842 0.44651952 ## [163] 0.44567063 0.44397284 0.44312394 0.44142615 0.44057725 0.43972835 ## [169] 0.43887946 0.43803056 0.43633277 0.43548387 0.43463497 0.43208829 ## [175] 0.43123939 0.42954160 0.42784380 0.42614601 0.42359932 0.42275042 ## [181] 0.42020374 0.41850594 0.41595925 0.41341256 0.41171477 0.41001698 ## [187] 0.40916808 0.40831919 0.40747029 0.40577250 0.40322581 0.40067912 ## [193] 0.39983022 0.39898132 0.39813243 0.39728353 0.39643463 0.39473684 ## [199] 0.39388795 0.39219015 0.39134126 0.39049236 0.38879457 0.38794567 ## [205] 0.38709677 0.38624788 0.38539898 0.38370119 0.38285229 0.38200340 ## [211] 0.38115450 0.38030560 0.37775891 0.37691002 0.37606112 0.37521222 ## [217] 0.37436333 0.37351443 0.37266553 0.37181664 0.37096774 0.36842105 ## [223] 0.36757216 0.36502547 0.36417657 0.36332767 0.36247878 0.36078098 ## [229] 0.35993209 0.35908319 0.35823430 0.35568761 0.35314092 0.35229202 ## [235] 0.35059423 0.34974533 0.34804754 0.34719864 0.34634975 0.34550085 ## [241] 0.34465195 0.34380306 0.34295416 0.34210526 0.34125637 0.34040747 ## [247] 0.33955857 0.33786078 0.33531409 0.33361630 0.33276740 0.33191851 ## [253] 0.33022071 0.32937182 0.32852292 0.32767402 0.32682513 0.32597623 ## [259] 0.32427844 0.32342954 0.32258065 0.32173175 0.32088285 0.31918506 ## [265] 0.31833616 0.31748727 0.31663837 0.31578947 0.31578947 0.31493368 ## [271] 0.31493368 0.31407084 0.31320563 0.31234043 0.31234043 0.31060520 ## [277] 0.30973759 0.30886754 0.30712253 0.30712253 0.30537254 0.30449755 ## [283] 0.30449755 0.30449755 0.30449755 0.30360980 0.30272205 0.30183430 ## [289] 0.30183430 0.30183430 0.30093865 0.30093865 0.29913662 0.29913662 ## [295] 0.29823289 0.29732640 0.29732640 0.29641436 0.29549382 0.29457040 ## [301] 0.29457040 0.29457040 0.29364116 0.29270896 0.29177379 0.29083862 ## [307] 0.29083862 0.29083862 0.29083862 0.28989434 0.28895006 0.28895006 ## [313] 0.28800268 0.28800268 0.28800268 0.28703946 0.28703946 0.28703946 ## [319] 0.28606973 0.28510000 0.28413027 0.28413027 0.28023808 0.28023808 ## [325] 0.27827837 0.27827837 0.27728806 0.27629774 0.27530387 0.27430639 ## [331] 0.27430639 0.27430639 0.27430639 0.27329791 0.27329791 0.27329791 ## [337] 0.27227814 0.27125454 0.27125454 0.26919177 0.26919177 0.26814839 ## [343] 0.26814839 0.26709683 0.26604526 0.26604526 0.26393379 0.26393379 ## [349] 0.26179668 0.26072812 0.26072812 0.26072812 0.26072812 0.25963721 ## [355] 0.25854629 0.25854629 0.25631745 0.25631745 0.25631745 0.25519325 ## [361] 0.25519325 0.25519325 0.25519325 0.25519325 0.25519325 0.25402798 ## [367] 0.25402798 0.25402798 0.25285193 0.25167587 0.25049982 0.24932376 ## [373] 0.24932376 0.24932376 0.24932376 0.24812509 0.24692642 0.24452907 ## [379] 0.24333040 0.24333040 0.24333040 0.24333040 0.24333040 0.24333040 ## [385] 0.24209522 0.24086004 0.24086004 0.24086004 0.24086004 0.23959899 ## [391] 0.23833127 0.23833127 0.23833127 0.23833127 0.23833127 0.23702176 ## [397] 0.23702176 0.23702176 0.23702176 0.23702176 0.23702176 0.23702176 ## [403] 0.23702176 0.23702176 0.23563567 0.23563567 0.23563567 0.23423308 ## [409] 0.23423308 0.23423308 0.23423308 0.23423308 0.23423308 0.23423308 ## [415] 0.23275059 0.23125860 0.23125860 0.23125860 0.22974710 0.22823561 ## [421] 0.22823561 0.22823561 0.22823561 0.22823561 0.22668299 0.22668299 ## [427] 0.22668299 0.22668299 0.22668299 0.22507531 0.22507531 0.22507531 ## [433] 0.22507531 0.22507531 0.22507531 0.22338301 0.22338301 0.22338301 ## [439] 0.22338301 0.22158153 0.21978006 0.21978006 0.21978006 0.21978006 ## [445] 0.21978006 0.21978006 0.21978006 0.21978006 0.21778206 0.21778206 ## [451] 0.21778206 0.21570794 0.21363383 0.21363383 0.21363383 0.21363383 ## [457] 0.21363383 0.21136113 0.20908843 0.20908843 0.20908843 0.20908843 ## [463] 0.20908843 0.20908843 0.20908843 0.20908843 0.20908843 0.20650709 ## [469] 0.20650709 0.20650709 0.20382518 0.20382518 0.20382518 0.20382518 ## [475] 0.20382518 0.20382518 0.20382518 0.20382518 0.20382518 0.20382518 ## [481] 0.20382518 0.20382518 0.20382518 0.20382518 0.20382518 0.20042809 ## [487] 0.20042809 0.20042809 0.20042809 0.20042809 0.20042809 0.19657371 ## [493] 0.19657371 0.19657371 0.19657371 0.19239129 0.19239129 0.19239129 ## [499] 0.19239129 0.19239129 0.19239129 0.18769882 0.18288603 0.18288603 ## [505] 0.18288603 0.18288603 0.18288603 0.18288603 0.18288603 0.18288603 ## [511] 0.18288603 0.18288603 0.18288603 0.18288603 0.18288603 0.18288603 ## [517] 0.18288603 0.18288603 0.18288603 0.18288603 0.18288603 0.18288603 ## [523] 0.18288603 0.18288603 0.18288603 0.18288603 0.18288603 0.18288603 ## [529] 0.18288603 0.18288603 0.18288603 0.18288603 0.18288603 0.18288603 ## [535] 0.18288603 0.18288603 0.18288603 0.98817967 0.98108747 0.96453901 ## [541] 0.95035461 0.93617021 0.91252955 0.90307329 0.88888889 0.87234043 ## [547] 0.86288416 0.85106383 0.83451537 0.82269504 0.81087470 0.79432624 ## [553] 0.78486998 0.78014184 0.76122931 0.75177305 0.73995272 0.72813239 ## [559] 0.72576832 0.71631206 0.70921986 0.70212766 0.68321513 0.67375887 ## [565] 0.65957447 0.65011820 0.64302600 0.63593381 0.62884161 0.62174941 ## [571] 0.61938534 0.61465721 0.60992908 0.60283688 0.59338061 0.58628842 ## [577] 0.58392435 0.58156028 0.57919622 0.57210402 0.56973995 0.56501182 ## [583] 0.56028369 0.55791962 0.55082742 0.54846336 0.54137116 0.53900709 ## [589] 0.53664303 0.53427896 0.53191489 0.52955083 0.52009456 0.51773050 ## [595] 0.51536643 0.50591017 0.50118203 0.49881797 0.49408983 0.48699764 ## [601] 0.48226950 0.47517730 0.47281324 0.47044917 0.46335697 0.46099291 ## [607] 0.45862884 0.45626478 0.45153664 0.44917258 0.44444444 0.44208038 ## [613] 0.43498818 0.43262411 0.42553191 0.42316785 0.41843972 0.41607565 ## [619] 0.41371158 0.40661939 0.39952719 0.39716312 0.39479905 0.39243499 ## [625] 0.39007092 0.38770686 0.38534279 0.38297872 0.37825059 0.37352246 ## [631] 0.37115839 0.36643026 0.36406619 0.35697400 0.35224586 0.34988180 ## [637] 0.34751773 0.34278960 0.34042553 0.33806147 0.33333333 0.33096927 ## [643] 0.32624113 0.32387707 0.32151300 0.31914894 0.31678487 0.31442080 ## [649] 0.31205674 0.30969267 0.30732861 0.30260047 0.29787234 0.29550827 ## [655] 0.29314421 0.29078014 0.28841608 0.28605201 0.28368794 0.28132388 ## [661] 0.27659574 0.27423168 0.27186761 0.26950355 0.26713948 0.26477541 ## [667] 0.26241135 0.26004728 0.25768322 0.25531915 0.25295508 0.25059102 ## [673] 0.24822695 0.24586288 0.24349882 0.23877069 0.23640662 0.23404255 ## [679] 0.23167849 0.22931442 0.22695035 0.22458629 0.22222222 0.21985816 ## [685] 0.21749409 0.21513002 0.21276596 0.21040189 0.20803783 0.20567376 ## [691] 0.20330969 0.20094563 0.19621749 0.19385343 0.19148936 0.18912530 ## [697] 0.18676123 0.18439716 0.18439716 0.18200239 0.17960763 0.17721286 ## [703] 0.17481809 0.17481809 0.16996203 0.16753400 0.16510597 0.16510597 ## [709] 0.16510597 0.16260437 0.16010276 0.15760116 0.15760116 0.15760116 ## [715] 0.15760116 0.15760116 0.15760116 0.15483622 0.15483622 0.15202102 ## [721] 0.15202102 0.15202102 0.15202102 0.15202102 0.15202102 0.15202102 ## [727] 0.15202102 0.14871622 0.14541141 0.14210661 0.14210661 0.13872312 ## [733] 0.13525504 0.13178696 0.12831888 0.12831888 0.12831888 0.12831888 ## [739] 0.12831888 0.12831888 0.12831888 0.12831888 0.12404159 0.11976429 ## [745] 0.11976429 0.11532857 0.11089286 0.11089286 0.11089286 0.11089286 ## [751] 0.10534822 0.10534822 0.10534822 0.10534822 0.10534822 0.10534822 ## [757] 0.10534822 0.08779018 0.08779018 0.08779018 0.08779018 0.08779018 ## [763] 0.08779018 0.08779018 0.08779018 0.08779018 0.08779018 0.08779018
(n = fitgen$strata) # nr fail times for each level of "gender"
## gender=0 gender=1 ## 537 231
# Compute and plot log cumulative hazard for each stratum Lambda = -log(fitgen$surv) plot( log(fitgen$time), log(Lambda), type="n", xlab = "Log Length of Stay", ylab = "Log Cum Hazard") lines( log(fitgen$time)[1:n[1]], log(Lambda)[1:n[1]], col=2 ) lines( log(fitgen$time)[(n[1]+1):sum(n)], log(Lambda)[(n[1]+1):sum(n)], col=3, lty=2) legend(4,-3, legend=c("Females", "Males"), lty=1:2, col=2:3)
# Fit KM curves for single and married separately fitmar = survfit(Surv(los, discharged) ~ married, data=nurshome) names(fitmar)
## [1] "n" "time" "n.risk" "n.event" "n.censor" ## [6] "surv" "type" "strata" "std.err" "upper" ## [11] "lower" "conf.type" "conf.int" "call"
(n = fitmar$strata) # nr fail times for each level of marital status
## married=0 married=1 ## 563 174
# Compute and plot log cumulative hazard for each stratum Lambda = -log(fitmar$surv) plot( log(fitmar$time), log(Lambda), type="n", xlab = "Log Length of Stay", ylab = "Log Cum Hazard") lines( log(fitmar$time)[1:n[1]], log(Lambda)[1:n[1]], col=2 ) lines( log(fitmar$time)[(n[1]+1):sum(n)], log(Lambda)[(n[1]+1):sum(n)], col=3, lty=2) legend(4,-3, legend=c("Not Married", "Married"), lty=1:2, col=2:3)
# Check PH for model including both gender and marital status fitboth = survfit(Surv(los, discharged) ~ gender + married, data=nurshome) fitboth
## Call: survfit(formula = Surv(los, discharged) ~ gender + married, data = nurshome) ## ## records n.max n.start events median 0.95LCL 0.95UCL ## gender=0, married=0 1065 1065 1065 811 148.0 128 168 ## gender=0, married=1 113 113 113 96 96.0 63 149 ## gender=1, married=0 261 261 261 228 66.0 49 100 ## gender=1, married=1 162 162 162 144 68.5 39 89
Testing PH using cox.zph
# Nursing Home example, continued fitnh = coxph(Surv(los, discharged) ~ married + gender + age, data=nurshome) (testnh = cox.zph(fitnh))
## rho chisq p ## married 0.00476 0.0299 0.863 ## gender -0.03721 1.8414 0.175 ## age 0.00281 0.0104 0.919 ## GLOBAL NA 2.0614 0.560
par(mfrow=c(2,2)) plot(testnh) # Schoenfeld residuals - see plot
# Halibut data, revisited fithal = coxph(Surv(Time, Death) ~ Towd + Length + Deldepth + Handtime + Logcat, data=halibut) (testhal = cox.zph(fithal))
## rho chisq p ## Towd -0.12120 3.6198 0.057096 ## Length 0.00719 0.0123 0.911639 ## Deldepth -0.06451 0.7290 0.393202 ## Handtime -0.03716 0.3819 0.536596 ## Logcat -0.16300 6.4651 0.011002 ## GLOBAL NA 20.8640 0.000859
par(mfrow=c(3,2)) plot(testhal) # see plot
What if proportional hazards fails?
The second two options relate to time-dependent covariates.
We will focus on the first alternative, and then the second two options will be briefly described.
Example: PH assumption for gender (nursing home data):
married
and health
as covariates in a Cox PH model, but stratify by gender
.gender
(i.e., males and females)In the above example, we make the PH assumption for married
and health
, but not for gender
.
This is like getting a KM survival estimate for each gen- der without assuming PH, but is more flexible since we can control for other covariates.
We would repeat the stratification for each variable for which we wanted to check the PH assumption.
# Nursing Home analysis, continued fitnh = coxph(Surv(los, discharged) ~ married + health + gender, data=nurshome) fitnh
## Call: ## coxph(formula = Surv(los, discharged) ~ married + health + gender, ## data = nurshome) ## ## ## coef exp(coef) se(coef) z p ## married 0.161 1.17 0.0767 2.10 3.6e-02 ## health 0.169 1.18 0.0311 5.42 6.0e-08 ## gender 0.355 1.43 0.0660 5.37 7.7e-08 ## ## Likelihood ratio test=74.1 on 3 df, p=5.55e-16 n= 1601, number of events= 1279
fitnhs = coxph(Surv(los, discharged) ~ married + health + strata(gender), data = nurshome) fitnhs
## Call: ## coxph(formula = Surv(los, discharged) ~ married + health + strata(gender), ## data = nurshome) ## ## ## coef exp(coef) se(coef) z p ## married 0.163 1.18 0.0768 2.12 3.4e-02 ## health 0.169 1.18 0.0312 5.43 5.6e-08 ## ## Likelihood ratio test=34.5 on 2 df, p=3.3e-08 n= 1601, number of events= 1279
# Plot log cum. base. haz. for strata bh = basehaz(fitnhs) # cum. base. hazards plot(log(bh$time), log(bh$hazard), type="n", xlab = "log Time", ylab="log Cumulative Hazard") lines(log(bh$time)[bh$strata=="gender=0"], log(bh$hazard)[bh$strata=="gender=0"], lty=2, col=2) lines(log(bh$time)[bh$strata=="gender=1"], log(bh$hazard)[bh$strata=="gender=1"], lty=3, col=3) legend(4,-3, legend=c("Females", "Males"), lty=2:3, col=2:3)