Joint Models for Longitudinal Measurements and Competing Risks Failure Time Data

Joint analysis of longitudinal outcomes and survival times has gained a lot of attention in recent years. There have been extended to handle competing risks for both continuous and ordinal outcomes (Elashoff, Li, and Li 2008, @MR2758452). This vignette offers a brief introduction to the R package JMcmprsk, which implements the methods proposed to deal with such joint models, and two competing risks are assumed. The data sets for generating the initial values are provided.

Joint modeling framework

Continuous outcomes

Let \(Y_i(t)\) be the longitudinal outcome measured at time \(t\) for subject \(i\), \(i=1,2,\cdots,n\), and \(n\) is the total number of subjects in study. Let \(C_i=(T_i,D_i)\) denote the competing risks data on subject \(i\), where \(T_i\) is the failure time or censoring time, and \(D_i\) takes value in \(\{0,1,\cdots,g\}\), with \(D_i=0\) indicating a censored event and \(D_i=k\) showing that subject \(i\) fails from the \(k\)th type of failure, where \(k=1,\cdots,g\).

The joint model is specified in terms of the following two linked components: \[\begin{eqnarray*} Y_i(t)&=&X_i^{(1)}(t)^\top \beta+\tilde X_i^{(1)}(t)^\top b_i+\epsilon_i(t),\\ \lambda_k(t)&=&\lambda_{0k}(t)\exp(X_i^{(2)}(t)^\top \gamma_k+\nu_k u_i),~~\mbox{for}~~k=1,\cdots,g, \end{eqnarray*}\] where \(X_i^{(1)}(t)\), \(X_i^{(2)}(t)\) denote the covaraites for the fixed-effects \(\beta\) and \(\gamma_k\), \(\tilde X_i^{(1)}(t)\) denotes the covaraites for the random-effects \(b_i\), and \(\epsilon_i(t)\sim N(0,\sigma^2)\) for all \(t\geq 0\). The parameter \(\nu_1\) is set to 1 to ensure identifiability. We assume that \(b_i\) is independent of \(\epsilon_i(t)\) and that \(\epsilon_i(t_1)\) is independent of \(\epsilon_i(t_2)\) for any \(t_1\neq t_2\). We further assume the random effects \(b_i\) and \(u_i\) jointly have a multivariate normal distribution, denoted by \(\theta_i\sim N(0,\Sigma)\), where \(\Sigma=(\Sigma_{b},\Sigma_{bu}^\top;\Sigma_{bu},\sigma_u)\).

Denote \(\Psi\) as the unknown parameters from the joint models. We propose to obtain the maximum likelihood estimate of \(\Psi\) through an EM algorithm. The complete data likelihood is \[\begin{eqnarray*} &&L(\Psi;Y,C,\theta)\\ &&\propto \Pi_{i=1}^n\Big[\Pi_{j=1}^{n_i}\frac{1}{\sqrt{2\pi\sigma^2}}\exp(-\frac{1}{2\sigma^2}(Y_{ij}-X_i^{(1)}(t_{ij})^\top\beta-\tilde X_i^{(1)}(t_{ij})^\top b_i)^2)\Big]\\ &&\times \Pi_{k=1}^g\lambda_k(T_i)^{I(D_i=k)}\exp\Big\{-\int_0^{T_i}\sum_{k=1}^g\lambda_k(t)dt\Big\}\\ &&\times \frac{1}{\sqrt{(2\pi)^d|\Sigma|}}\exp(-\frac{1}{2}\theta_i^\top\Sigma^{-1}\theta_i). \end{eqnarray*}\] In the E-step, we need to calculate the expected value of all the functions of \(\theta\), The integral does not have a closed-form solution, and thus a numerical method must be employed for its evaluation. Standard option is the Gaussian quadratic rules. In the M-step, we can update \(\Psi\) through maximizing the function obtained from the E-step.

Ordinal outcomes

Let \(Y_{ij}\) denote the \(j\)th response measured on subject \(i\), where \(i=1,\cdots,n\), \(j=1,\cdots,n_i\), and \(Y_{ij}\) takes values in \(\{1,\cdots,K\}\). The competing risks failure times on subject \(i\) is \((T_i,D_i)\), and the meaning is the same as in subsection of “Continuous outcomes”.

we propose the following partial proportional odds model for \(Y_{ij}\) \[\begin{eqnarray*} P(Y_{ij}\leq k|X_{ij},\tilde X_{ij},W_{ij},b_i)=\frac{1}{1+\exp(-\theta_{k}-X_{ij}\beta-\tilde X_{ij}\alpha_{k}-W_{ij}^\top b_i)}. \end{eqnarray*}\] where \(k=1,\cdots,K-1\), \(X_{ij}\) and \(\tilde X_{ij}\) are \(p\times 1\) and \(s\times 1\) vectors of covaraites for the fixed-effect \(\beta\) and \(\alpha_{k}\), with \(\alpha_{1}=0\), and \(\tilde X_{ij}\) is a subset of \(X_{ij}\) for which the proportional odds assumption may not be satisfied. The \(q\times 1\) vector \(b_i\) represents random effects of the associated covaraites \(W_{ij}\).

The distribution of the competing risks failure times \((T_i,D_i)\) are assumed to take the form of the following cause-specific hazards frailty model: \[\begin{eqnarray*} \lambda_d(t|Z_i(t),u_i)&=&\lambda_{0d}(t)\exp(Z_i(t)^\top \gamma_d+\nu_d u_i),~~\mbox{for}~~d=1,\cdots,g, \end{eqnarray*}\] where the \(l\times 1\) vector \(\gamma_d\) and \(\nu_d\) are cause-specific coefficients for the covariates \(Z_i(t)\) and the random effects \(u_i\), respectively.

The parameter \(\nu_1\) is set to 1 to ensure identifiability. We assume the random effects \(b_i\) and \(u_i\) jointly have a multivariate normal distribution, denoted by \(a_i\sim N(0,\Sigma)\).

Denote \(\Psi\) as the unknown parameters from the joint models. We propose to obtain the maximum likelihood estimate of \(\Psi\) through an EM algorithm. The complete data likelihood is \[\begin{eqnarray*} &&L(\Psi;Y,C,a)\\ &&\propto \Pi_{i=1}^n\Big[\Pi_{j=1}^{n_i}\Pi_{k=1}^{K}\{\pi_{ij}(k)-\pi_{ij}(k-1)\}^{I(Y_{ij}=k)}\Big]\\ &&\times \Pi_{d=1}^g\lambda_d(T_i)^{I(D_i=d)}\exp\Big\{-\int_0^{T_i}\sum_{k=1}^d\lambda_d(t)dt\Big\}\\ &&\times \frac{1}{\sqrt{(2\pi)^{q+1}|\Sigma|}}\exp(-\frac{1}{2}a_i^\top\Sigma^{-1}a_i). \end{eqnarray*}\] where \(\pi_{ij}(k)\) stands for the probability that \(Y_{ij}\leq k\) given the covariates and the random effects. The implementation of EM algorithm is the same as that of subection of “Continuous outcomes”.

The R package JMcmprsk

The function that fits continuous outcomes in JMcmprsk is called jmc(). As an illustrative example of jmc(), we consider Scleroderma Lung Study (Tashkin et al. 2006), a double-blinded, randomized clinical trial to evaluate effectiveness of oral cyclophosphamide (CYC) versus placebo in the treatment of lung disease due to scleroderma. This study includes 158 patients, the primary outcome is forced vital capacity (FVC, as % predicted) determined at 3-month intervals from the baseline. The event of interest is the time-to-treatment failure or death. We consider two factors baseline %FVC (\(FVC_0\)) and baseline lung fibrosis (\(FIB_0\)) and two risks, informative and noninformative. The model setups are as follows, \[\begin{eqnarray*} \%FVC_{ij}&=&\beta_1+\beta_2t_{ij}+\beta_3FVC_{0i}+\beta_4FIB_{0i}+\beta_5CYC_i\\ &&+\beta_6FVC_{0i}\times CYC_i+\beta_7FIB_{0i}\times CYC_i+\beta_8 t_{ij}\times CYC_i+b_it_{ij}+\epsilon, \end{eqnarray*}\] and the cause-specific hazards frailty models are \[\begin{eqnarray*} \lambda_1(t)=\lambda_{01}(t)\exp(\gamma_{11}FVC_{0i}+\gamma_{12}FIB_{0i}+\gamma_{13}CYC_i+\gamma_{14}FVC_{0i}\times CYC_i+\gamma_{15}FIB_{0i}\times CYC_i+u_i)\\ \lambda_2(t)=\lambda_{02}(t)\exp(\gamma_{21}FVC_{0i}+\gamma_{22}FIB_{0i}+\gamma_{23}CYC_i+\gamma_{24}FVC_{0i}\times CYC_i+\gamma_{25}FIB_{0i}\times CYC_i+\nu_2u_i), \end{eqnarray*}\]

Examples

We first load the package and the data.

set.seed(123)
require(JMcmprsk)
yfile=system.file("extdata", "fvc621_y.txt", package = "JMcmprsk")
cfile=system.file("extdata", "fvc621_c.txt", package = "JMcmprsk")
mfile=system.file("extdata", "fvc621_m.txt", package = "JMcmprsk")

The number of rows in yfile is the total number of measurements for all subjects in the study. The columns in yfile should start with the longitudinal outcome (column 1), the covariates for the random effects, and then the covariates for the fixed effects. For cfile, the survival / censoring time is included in the first column, and the failure type coded as 0 (censored events), 1 (risk 1), or 2 (risk 2) is given in the second column. Two competing risks are assumed. The covariates are included in the third column and on. mfile is to indicate the number of longitudinal measurements per subject. The number of rows equals to the number of subjects.

Hence, the model can be specified by the function jmc():

jmcfit = jmc(p=7,yfile,cfile,mfile,point=20,do.trace = F)

with \(p\) the dimension of fixed effects (include intercept) in yfile, the option “point” is the number of points used to approximate the integral in the E-step, default is 20, and “do.trace” is used to control whether the programm prints the iteration details. If we would like to see a concise summary of the result we can simply type

#because of the long running time, we save the jmo and jmc results within the package
fitfile=system.file("extdata", "runfit.RData", package = "JMcmprsk")
load(fitfile)
jmcfit

## Model Type:                             jmc 
## 
##                   Estimate   Std. Error       95% CI                Pr(>|Z|)    
## Longitudinal:                
##  Fixed effects:                 
##   intercept       66.0415       0.7541      ( 64.5634, 67.5196)      0
##   time.1         -0.0616       0.0790      (-0.2165, 0.0932)      0.4353101
##   FVC0            0.9017       0.0365      ( 0.8302, 0.9732)      6.687478e-135
##   FIB0           -1.7780       0.5605      (-2.8767,-0.6793)      0.001514602
##   CYC             0.0150       0.9678      (-1.8819, 1.9119)      0.9876085
##   FVC0.CYC        0.1380       0.0650      ( 0.0106, 0.2654)      0.03381009
##   FIB0.CYC        1.7088       0.7643      ( 0.2109, 3.2068)      0.02535631
##   time.CYC        0.1278       0.1102      (-0.0883, 0.3438)      0.2464477
##   sigma^2         22.7366       0.6575      ( 21.4478, 24.0253)     5.157487e-262
## Survival:                
##  Fixed effects:                 
##   FVC0_1          0.0187       0.0326      (-0.0452, 0.0826)      0.5660375
##   FIB0_1          0.1803       0.3521      (-0.5098, 0.8705)      0.6085571
##   CYC_1          -0.6872       0.7653      (-2.1872, 0.8128)      0.3692041
##   FVC0.CYC_1     -0.0517       0.0746      (-0.1979, 0.0945)      0.488001
##   FIB0.CYC_1     -0.4665       1.1099      (-2.6419, 1.7089)      0.674281
##   FVC0_2         -0.0677       0.0271      (-0.1208,-0.0147)      0.01233242
##   FIB0_2          0.1965       0.3290      (-0.4484, 0.8414)      0.5503444
##   CYC_2           0.3137       0.4665      (-0.6007, 1.2280)      0.5013296
##   FVC0.CYC_2      0.1051       0.0410      ( 0.0248, 0.1854)      0.01034231
##   FIB0.CYC_2      0.1239       0.4120      (-0.6836, 0.9314)      0.7635869
## Random effects:                 
##   v2              1.9949       2.3093      (-2.5314, 6.5212)      0.3876868
##   sigma_b11       0.2215       0.0294      ( 0.1638, 0.2792)     5.337397e-14
##   sigma_u         0.0501       0.0898      (-0.1259, 0.2260)      0.5771715
## Covariance:                 
##   sigma_b1u      -0.0997       0.0797      (-0.2560, 0.0565)      0.2109131

The resulting table contains three parts, the fixed effects in longitudinal model, survival model and random effects. It gives the estimated parameters in the first column, standard error in the second column, 95% confidence interval and \(p\)-value for these parameters in the third and forth columns. In our example, there is only one random effect, if there are more than one random effect, the outputs will include \(sigma_b11, sigma_b12, sigma_b22, sigma_b1u, sigma_b2u\) and so on.

By the way, the supporting function coef() can be used to extract the coefficients of the longitudinal process:

coef(jmcfit)

##   intercept      time.1        FVC0        FIB0         CYC    FVC0.CYC 
## 66.04146267 -0.06164756  0.90166283 -1.77799172  0.01503104  0.13798885 
##    FIB0.CYC    time.CYC 
##  1.70883750  0.12776670

We proceed by checking the fit of the model using Wald test.

anova(jmcfit,coeff="beta")

##         Chisq df    Pr(>|Chi|)
## beta 1067.165  7 3.684936e-226

anova(jmcfit,coeff="gamma")

##           Chisq df Pr(>|Chi|)
## gamma1 1.519491  5  0.9108096
## gamma2 9.162137  5  0.1027692

We can see that the hypothesis that \(\beta_2=\beta_3=\cdots=\beta_8=0\) will be rejected, and the hypothesis \(\gamma_{11}=\gamma_{12}=\cdots= \gamma_{15}=0\) and \(\gamma_{21}=\gamma_{22}=\cdots=\gamma_{25}=0\) will not be rejected at the nominal level of 0.05.

The implement of jmo() is the same as that of jmc(). As an illustrative example, we use the data from NINDS rt-PA stroke trial (Group 1995). In this study, 624 patients are included, and the patients treated with rt-PA were compared with those given placebo to look for an improvement from baseline in the score on the modified Rankin scale, an ordinal measure of degree of disability with categories ranging from no symptoms, no significant disability to severe disability or death, which means in this example, \(Y_{ij}\) takes \(K=4\) ordinal values. The following covaraites are considered: treatment group (rt-PA or placebo), modified Rankin scale prior stroke onset, time since randomization (dummy variables for 3, 6 and 12 months), and the three subtypes of acute stroke (small vessel occlusive disease, large vessel atherosclerosis or cardioembolic stroke, and unknown reasons). Similarly, we also consider the informative and noninformative risks. The model setups are as follows, \[\begin{eqnarray*} P(Y_{ij}\leq k)&=&[1+\exp(-\theta_{k}-(\beta_1Group+\beta_2\mbox{Modified Rankin scale prior onset}+\beta_3time3\\ &&+\beta_4time6+\beta_5time12+\beta_6\mbox{Small vessel}+\beta_7\mbox{Large vessel or cardioembolic stroke}\\ &&+\beta_8 \mbox{Small vessel*group}+\beta_9\mbox{Large vessel or cardioembolic stroke*group})\\ &&-(\alpha_{k1}\mbox{Small vessel}+\alpha_{k2}\mbox{Large vessel or cardioembolic stroke})-b_i)]^{-1}. \end{eqnarray*}\] where \(k=1,\cdots,K-1\). \[\begin{eqnarray*} \lambda_1(t)&=&\lambda_{01}(t)\exp(\gamma_{11}Group+\gamma_{12}\mbox{Modified Rankin scale prior onset}\\ &&+\gamma_{13}\mbox{Small vessel}+\gamma_{14}\mbox{Large vessel or cardioembolic stroke}\\ &&+\gamma_{15} \mbox{Small vessel*group}+\gamma_{16}\mbox{Large vessel or cardioembolic stroke*group}+u_i)\\ \lambda_2(t)&=&\lambda_{02}(t)\exp(\gamma_{21}Group+\gamma_{22}\mbox{Modified Rankin scale prior onset}\\ &&+\gamma_{23}\mbox{Small vessel}+\gamma_{24}\mbox{Large vessel or cardioembolic stroke}\\ &&+\gamma_{25} \mbox{Small vessel*group}+\gamma_{26}\mbox{Large vessel or cardioembolic stroke*group}+\nu_2u_i), \end{eqnarray*}\]

We first load the package and the data.

set.seed(123)
require(JMcmprsk)
yfile=system.file("extdata", "ninds_nrank_y.txt", package = "JMcmprsk")
cfile=system.file("extdata", "ninds_nrank_c.txt", package = "JMcmprsk")
mfile=system.file("extdata", "ninds_nrank_m.txt", package = "JMcmprsk")

In particular, we put the non-proportional odds covariates in front of the proportional odds covariates in yfile, and the other arrangements are the same with those in jmc().

jmofit = jmo(p=9,s=2, yfile,cfile,mfile,point=20,do.trace = F)

with \(p\) the dimension of proportional odds covariates (not including intercept) in yfile and \(s\) the dimension of non-proportional odds covariates in yfile. If we would like to see a concise summary of the result we can simply type

#because of the long running time, we save the jmo and jmc results within the package
fitfile=system.file("extdata", "runfit.RData", package = "JMcmprsk")
load(fitfile)
jmofit

## Model Type:                             jmo 
## 
##                   Estimate   Std. Error       95% CI                Pr(>|Z|)    
## Longitudinal:                
##  Fixed effects:                 
##   group           1.5153       0.3326      ( 0.8634, 2.1673)      5.224967e-06
##   mrkprior       -1.6831       0.2739      (-2.2199,-1.1463)      7.978244e-10
##   time3           2.1384       0.1999      ( 1.7467, 2.5302)      1.035586e-26
##   time6           2.2898       0.1987      ( 1.9003, 2.6792)      1.002984e-30
##   time12          2.4872       0.2147      ( 2.0664, 2.9079)      4.797476e-31
##   smlves.1        3.5444       0.6748      ( 2.2219, 4.8670)      1.497489e-07
##   lvORcs.1       -1.0382       0.4381      (-1.8968,-0.1796)      0.01778312
##   smlves.group   -0.7316       1.2482      (-3.1781, 1.7149)      0.5578096
##   lvORcs.group   -2.2769       0.7533      (-3.7534,-0.8005)      0.002504966
##   smlves_2        0.2730       0.4084      (-0.5274, 1.0734)      0.5038251
##   lvORcs_2       -0.3192       0.2400      (-0.7897, 0.1513)      0.1836338
##   smlves_3        2.7083       1.2015      ( 0.3533, 5.0634)      0.02419251
##   lvORcs_3        0.3969       0.4957      (-0.5745, 1.3684)      0.4232214
##   theta1         -4.6730       0.2225      (-5.1090,-4.2369)      5.848706e-98
##   theta2         -2.8442       0.1940      (-3.2245,-2.4639)      1.191674e-48
##   theta3          3.5552       0.2496      ( 3.0661, 4.0443)      4.723012e-46
## Survival:                
##  Fixed effects:                 
##   group_1        -0.4699       0.2607      (-0.9808, 0.0410)      0.07143253
##   mrkprior_1      0.5318       0.1616      ( 0.2150, 0.8486)      0.001002607
##   smlves_1       -2.1029       0.7403      (-3.5539,-0.6518)      0.004506389
##   lvORcs_1        0.3755       0.2660      (-0.1458, 0.8968)      0.158049
##   smlves.group_1  0.3318       1.4797      (-2.5684, 3.2319)      0.8225872
##   lvORcs.group_1  0.8076       0.5403      (-0.2515, 1.8666)      0.1350328
##   group_2         0.2078       0.4812      (-0.7354, 1.1511)      0.6658155
##   mrkprior_2      0.0611       0.4241      (-0.7701, 0.8923)      0.8854034
##   smlves_2        0.7815       0.5963      (-0.3873, 1.9502)      0.1900336
##   lvORcs_2       -0.3259       0.5080      (-1.3215, 0.6698)      0.5211938
##   smlves.group_2 -0.0397       1.1382      (-2.2706, 2.1912)      0.9721772
##   lvORcs.group_2  0.0983       1.0654      (-1.9899, 2.1865)      0.9264852
## Random effects:                 
##   v2              0.0115        0.1826     (-0.3464, 0.3695)      0.9496347
##   sigma_b11       35.4448        3.9952     ( 27.6143, 43.2753)      7.188018e-19
##   sigma_u         5.0375        1.2773     ( 2.5339, 7.5411)      8.02044e-05
## Covariance:                 
##   sigma_b1u      -13.3527        0.6497     (-14.6260,-12.0793)      7.180908e-94

The usage of functions coef() and anova() are the same with those in jmc(). For example, anova(jmofit,coeff=“beta”) presents the result of hypothesis \(\beta_1=\beta_2=\cdots=\beta_9=0\) at the nominal level of 0.05.