简体   繁体   English

如何呈现包含时变协变量的生存数据并使模型适合R.

[英]How to present survival data that includes time-varying covariates and fit the model in R

I want to perform a survival analysis which includes time-varying covariates, using the aalen() function from an R package called timereg . 我想使用名为timereg的R包中的aalen()函数执行包含时变协变量的生存分析。 However, I am still confused as to how the data should be presented in a dataframe, and how the model formula should be specified. 但是,我仍然对如何在数据框中显示数据以及如何指定模型公式感到困惑。

Here's a made up data set: 这是一个组成的数据集:

subject_id survival_time weight height outcome_indicator subject_id survival_time weight height outcome_indicator
1 3 65 1.8 0 1 3 65 1.8 0
1 4 68 1.8 0 1 4 68 1.8 0
1 7 70 1.8 1 1 7 70 1.8 1
2 2 55 1.6 0 2 2 55 1.6 0
2 9 53 1.6 0 2 9 53 1.6 0
3 2 62 1.7 0 3 2 62 1.7 0
3 3 65 1.7 0 3 3 65 1.7 0
3 5 64 1.7 0 3 5 64 1.7 0
3 6 66 1.7 0 3 6 66 1.7 0

And here are some interpretations: 以下是一些解释:

  1. There are 3 study subjects, identified by the subject_id variable, and they were followed up for 3, 2, 4 times, respectively. 有3个研究对象,由subject_id变量确定,并分别进行3次,2次,4次随访。
  2. weight is a time-varying covariate. weight是一种随时间变化的协变量。
  3. height is independent of time and so for each subject, it remained the same at each follow up. height与时间无关,因此对于每个受试者,每次随访都保持不变。
  4. Suppose the unit of survival_time is in years, then the interested event happened to subject 1 at year 7. 假设survival_time时间的单位是年,那么感兴趣的事件发生在第7年的主题1。
  5. Both subject 2 and 3 are right censored cases. 主题2和3都是正确的审查案例。
  6. Each follow up that belongs to the same subject can be ordered by survival_time . 属于同一主题的每个跟进都可以通过survival_time进行排序。

Finally, a list of my questions (please don't hesitate to leave a comment even if you don't have all the answers, or if my solution is correct): 最后,我的问题列表(即使你没有得到所有的答案,或者我的解决方案是正确的,请不要犹豫,发表评论):

  1. Am I right about the presentation of survival data that includes time-varying covariates? 我是否正确提供包含时变协变量的生存数据?
  2. If the answer to the first question is "no", then can you please point out what the problems are and provide some explanations? 如果第一个问题的答案是“否”,那么请您指出问题是什么并提供一些解释?
  3. Assuming the data set is alright, then how do I specify the model formula and fit the aalen model (or any other model that includes time-varying covariates)? 假设数据集没问题,那么如何指定模型公式并拟合aalen模型(或包含时变协变量的任何其他模型)? Is it something like: 是这样的:

aalen(formula = Survf(survival_time, outcome_indicator) ~ const(height) + weight, data = data_set, id = data_set$subject_id)

where the Survf() function is used to combine the two outcome-related variables; 其中Survf()函数用于组合两个与结果相关的变量; const() is used to denote time-varying covariates, leaving other covariates as they are; const()用于表示随时间变化的协变量,保留其他协变量; data_set is the name of the dataframe; data_set是数据帧的名称; and the id parameter is used to associate different rows of the same subject? id参数用于关联同一主题的不同行?

This is likely not the right way to represent these data. 这可能不是表示这些数据的正确方法。 Judging from the ordering of the variable survival_time , these are the cohort times at which the covariate changes. 从变量survival_time的排序来看,这些是协变量变化的群组时间。 You need a lagged event time to indicate the "start" of observation, set to 0 for the first patient record. 您需要一个滞后的事件时间来指示观察的“开始”,对于第一个患者记录设置为0。 The way you have format the data now have squared the denominator time, reduced the incidence, and attenuated the hazard ratios toward the null. 现在,您对数据进行格式化的方式已经使分母时间平方,降低了发生率,并将风险比减弱到零。

Take the first participant: they are in fact observed from 0 to 7. The first record is 0 to 3, the next: 3 to 4, the last 4 to 7. Where have you told R this explicitly? 拿第一个参与者:事实上他们从0到7被观察。第一个记录是0到3,下一个:3到4,最后4到7.你在哪里明确告诉R? R does not know these records belong to the same individual. R不知道这些记录属于同一个人。 R now believes there are 3 people followed for a cumulative of 3 + 4 + 7 = 14 years having 1 event rather than 7 years having 1 event (incidence goes from 14 ppy to 7 ppy). R现在认为有3人随后累计3 + 4 + 7 = 14年有1个事件而不是7年有1个事件(发病率从14 ppy到7 ppy)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM