From September 1967 till March 1974, men with serious heart disease were enrolled into a follow-up study. The follow-up was closed in April 1974. During the follow-up, some of the men underwent transplantation of the heart. The goal of the analysis is to estimate the effect of heart transplant on survival.
This can be done by specifying a Cox regression model with a time-varying covariate indicating wheter or not the heart has already been transplanted. At the start of the follow-up, all patients have zero in this covariate. After transplantation, the covariate is switched to 1. The model formula is \[ \lambda(t\mid Z(t))=\lambda_0(t)\exp\{\beta Z(t)\} \] where \(Z(t)\) is the time-varying indicator of transplantation. The baseline hazard \(\lambda_0(t)\) is the risk of death before transplantation. The value \(\mathrm{e}^\beta\) expresses the relative change in the risk of death after the transplantation.
The original format of the dataset is this:
library(survival)
print(subset(jasa,select=c(birth.dt:fustat,transplant))[1:33,])
## birth.dt accept.dt tx.date fu.date fustat transplant
## 1 1937-01-10 1967-11-15 <NA> 1968-01-03 1 0
## 2 1916-03-02 1968-01-02 <NA> 1968-01-07 1 0
## 3 1913-09-19 1968-01-06 1968-01-06 1968-01-21 1 1
## 4 1927-12-23 1968-03-28 1968-05-02 1968-05-05 1 1
## 5 1947-07-28 1968-05-10 <NA> 1968-05-27 1 0
## 6 1913-11-08 1968-06-13 <NA> 1968-06-15 1 0
## 7 1917-08-29 1968-07-12 1968-08-31 1970-05-17 1 1
## 8 1923-03-27 1968-08-01 <NA> 1968-09-09 1 0
## 9 1921-06-11 1968-08-09 <NA> 1968-11-01 1 0
## 10 1926-02-09 1968-08-11 1968-08-22 1968-10-07 1 1
## 11 1920-08-22 1968-08-15 1968-09-09 1969-01-14 1 1
## 12 1915-07-09 1968-09-17 <NA> 1968-09-24 1 0
## 13 1914-02-22 1968-09-19 1968-10-05 1968-12-08 1 1
## 14 1914-09-16 1968-09-20 1968-10-26 1972-07-07 1 1
## 15 1914-12-04 1968-09-27 <NA> 1968-09-27 1 0
## 16 1919-05-16 1968-10-26 1968-11-22 1969-08-29 1 1
## 17 1948-06-29 1968-10-28 <NA> 1968-12-02 1 0
## 18 1911-12-27 1968-11-01 1968-11-20 1968-12-13 1 1
## 19 1909-10-04 1968-11-18 <NA> 1968-12-24 1 0
## 20 1913-10-19 1969-01-29 1969-02-15 1969-02-25 1 1
## 21 1925-09-29 1969-02-01 1969-02-08 1971-11-29 1 1
## 22 1926-06-05 1969-03-18 1969-03-29 1969-05-07 1 1
## 23 1910-12-02 1969-04-11 1969-04-13 1971-04-13 1 1
## 24 1917-07-07 1969-04-25 1969-07-16 1969-11-29 1 1
## 25 1936-02-06 1969-04-28 1969-05-22 1974-04-01 0 1
## 26 1938-10-18 1969-05-01 <NA> 1973-03-01 0 0
## 27 1960-07-21 1969-05-04 <NA> 1970-01-21 1 0
## 28 1915-05-30 1969-06-07 1969-08-16 1969-08-17 1 1
## 29 1919-02-06 1969-07-14 <NA> 1969-08-17 1 0
## 30 1924-09-20 1969-08-19 1969-09-03 1971-12-18 1 1
## 31 1914-10-04 1969-08-23 <NA> 1969-09-07 1 0
## 32 1905-04-02 1969-08-29 1969-09-14 1969-11-13 1 1
## 33 1921-01-01 1969-11-27 1970-01-16 1974-04-01 0 1
The columns are: birth date, enrollment date, date of transplantation (missing if no transplantation), date of the end of follow-up, survival status at the end of follow-up (1=dead, 0=alive), indicator of transplantation (at any time during follow-up).
In order to be analyzed, this dataset must be transformed into a different format, where the follow-up period is divided into subintervals and the subject’s data are written into several lines, one line for each subinterval. The time-varying covariate is created by changing the value of the covariate between the lines pertaining to the same subject.
The transformed dataset looks like this:
print(subset(heart,select=c(id,start:transplant),id<=10))
## id start stop event age year surgery transplant
## 1 1 0 50 1 -17.1553730 0.1232033 0 0
## 2 2 0 6 1 3.8357290 0.2546201 0 0
## 3 3 0 1 0 6.2970568 0.2655715 0 0
## 4 3 1 16 1 6.2970568 0.2655715 0 1
## 5 4 0 36 0 -7.7371663 0.4900753 0 0
## 6 4 36 39 1 -7.7371663 0.4900753 0 1
## 7 5 0 18 1 -27.2142368 0.6078029 0 0
## 8 6 0 3 1 6.5954825 0.7008898 0 0
## 9 7 0 51 0 2.8692676 0.7802875 0 0
## 10 7 51 675 1 2.8692676 0.7802875 0 1
## 11 8 0 40 1 -2.6502396 0.8350445 0 0
## 12 9 0 85 1 -0.8377823 0.8569473 0 0
## 13 10 0 12 0 -5.4976044 0.8624230 0 0
## 14 10 12 58 1 -5.4976044 0.8624230 0 1
The variable id
identifies the patient; its value corresponds to the row numbers of the untransformed dataset jasa
. The variables start
and stop
define the intervals (in days after the start of the follow-up). The intervals are considered open on the left and closed on the right. The variable event
shows the survival status at the end of each interval. The variable transplant
is the time-varying transplantation indicator. The variables age
, year
, and surgery
are time invariant. The variable age
expresses the age at enrollment in years (decreased by 48), the variable year
is the enrollment time in years after Nov. 1, 196, the variable surgery
is a binary indicator of bypass surgery before enrollment.
The first subject died 50 days after enrollment without having a transplant. Subject #4 lived without transplant for 36 days, was transplanted on day 36, and died tree days later, on day 39. There are two rows for subject #4; the first has transplant=0
, the second has transplant=1
. The death of this subject is indicated by event=1
on the second row. The first row has event=0
because subject #4 was still alive at the end of the first interval (day 36). Subject #3 was transplanted on the day of enrollment. In the transformed data, transplantation was moved to day 1 because the intervals cannot have zero width. Another possible solution would be to consider the patient transplanted at the time of enrollment.
Investigate the structure of the transformed dataset heart
and think how it might have been created from the original dataset jasa
.
The survival object used on the left-hand side of the model formula must be adapted to express the interval structure of the data. Therefore, it is written with three arguments:
Surv(start,stop,delta)
Here, start
is the left boundary of the time interval, stop
is the right boundary, and delta
is the survival status at the end of this interval (the value of stop
). If the subject is written over multiple lines, delta
is zero in all lines except the last (because the subject is observed in subsequent intervals, it must have survived). The value of delta
in the last line shows the final survival status of the subject (0=censored, 1=died).
The proportional hazards model is specifies as usual, with the tree-argument survival object as the outcome. For example, the model introduced at the beginning of this assignment would be fitted on the transformed heart
data by the code
fit=coxph(Surv(start,stop,event)~transplant,data=heart)
Of course, the time-invariant covariates age
, year
and surgery
could be also included in the model.
Estimate and test the effect of heart transplantation on the survival of the patients. Adjust the effect of transplantation for age at enrollment, time of enrollment, and previous bypass surgery. Consider the interactions of heart transplantation with these covariates.
Please provide both quantitative (effect size, confidence intervals if possible) and qualitative (does transplantation help to survive longer?) answers.