简体   繁体   中英

How to present survival data that includes time-varying covariates and fit the model in R

I want to perform a survival analysis which includes time-varying covariates, using the aalen() function from an R package called timereg . However, I am still confused as to how the data should be presented in a dataframe, and how the model formula should be specified.

Here's a made up data set:

subject_id survival_time weight height outcome_indicator
1 3 65 1.8 0
1 4 68 1.8 0
1 7 70 1.8 1
2 2 55 1.6 0
2 9 53 1.6 0
3 2 62 1.7 0
3 3 65 1.7 0
3 5 64 1.7 0
3 6 66 1.7 0

And here are some interpretations:

  1. There are 3 study subjects, identified by the subject_id variable, and they were followed up for 3, 2, 4 times, respectively.
  2. weight is a time-varying covariate.
  3. height is independent of time and so for each subject, it remained the same at each follow up.
  4. Suppose the unit of survival_time is in years, then the interested event happened to subject 1 at year 7.
  5. Both subject 2 and 3 are right censored cases.
  6. Each follow up that belongs to the same subject can be ordered by survival_time .

Finally, a list of my questions (please don't hesitate to leave a comment even if you don't have all the answers, or if my solution is correct):

  1. Am I right about the presentation of survival data that includes time-varying covariates?
  2. If the answer to the first question is "no", then can you please point out what the problems are and provide some explanations?
  3. Assuming the data set is alright, then how do I specify the model formula and fit the aalen model (or any other model that includes time-varying covariates)? Is it something like:

aalen(formula = Survf(survival_time, outcome_indicator) ~ const(height) + weight, data = data_set, id = data_set$subject_id)

where the Survf() function is used to combine the two outcome-related variables; const() is used to denote time-varying covariates, leaving other covariates as they are; data_set is the name of the dataframe; and the id parameter is used to associate different rows of the same subject?

This is likely not the right way to represent these data. Judging from the ordering of the variable survival_time , these are the cohort times at which the covariate changes. You need a lagged event time to indicate the "start" of observation, set to 0 for the first patient record. The way you have format the data now have squared the denominator time, reduced the incidence, and attenuated the hazard ratios toward the null.

Take the first participant: they are in fact observed from 0 to 7. The first record is 0 to 3, the next: 3 to 4, the last 4 to 7. Where have you told R this explicitly? R does not know these records belong to the same individual. R now believes there are 3 people followed for a cumulative of 3 + 4 + 7 = 14 years having 1 event rather than 7 years having 1 event (incidence goes from 14 ppy to 7 ppy).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM