[英]How to present survival data that includes time-varying covariates and fit the model in R
I want to perform a survival analysis which includes time-varying covariates, using the aalen()
function from an R package called timereg
. 我想使用名为
timereg
的R包中的aalen()
函数执行包含时变协变量的生存分析。 However, I am still confused as to how the data should be presented in a dataframe, and how the model formula should be specified. 但是,我仍然对如何在数据框中显示数据以及如何指定模型公式感到困惑。
Here's a made up data set: 这是一个组成的数据集:
subject_id survival_time weight height outcome_indicator subject_id survival_time weight height outcome_indicator
1 3 65 1.8 0 1 3 65 1.8 0
1 4 68 1.8 0 1 4 68 1.8 0
1 7 70 1.8 1 1 7 70 1.8 1
2 2 55 1.6 0 2 2 55 1.6 0
2 9 53 1.6 0 2 9 53 1.6 0
3 2 62 1.7 0 3 2 62 1.7 0
3 3 65 1.7 0 3 3 65 1.7 0
3 5 64 1.7 0 3 5 64 1.7 0
3 6 66 1.7 0 3 6 66 1.7 0
And here are some interpretations: 以下是一些解释:
subject_id
variable, and they were followed up for 3, 2, 4 times, respectively. subject_id
变量确定,并分别进行3次,2次,4次随访。 weight
is a time-varying covariate. weight
是一种随时间变化的协变量。 height
is independent of time and so for each subject, it remained the same at each follow up. height
与时间无关,因此对于每个受试者,每次随访都保持不变。 survival_time
is in years, then the interested event happened to subject 1 at year 7. survival_time
时间的单位是年,那么感兴趣的事件发生在第7年的主题1。 survival_time
. survival_time
进行排序。 Finally, a list of my questions (please don't hesitate to leave a comment even if you don't have all the answers, or if my solution is correct): 最后,我的问题列表(即使你没有得到所有的答案,或者我的解决方案是正确的,请不要犹豫,发表评论):
aalen
model (or any other model that includes time-varying covariates)? aalen
模型(或包含时变协变量的任何其他模型)? Is it something like: aalen(formula = Survf(survival_time, outcome_indicator) ~ const(height) + weight, data = data_set, id = data_set$subject_id)
where the Survf()
function is used to combine the two outcome-related variables; 其中
Survf()
函数用于组合两个与结果相关的变量; const()
is used to denote time-varying covariates, leaving other covariates as they are; const()
用于表示随时间变化的协变量,保留其他协变量; data_set
is the name of the dataframe; data_set
是数据帧的名称; and the id
parameter is used to associate different rows of the same subject? 和
id
参数用于关联同一主题的不同行?
This is likely not the right way to represent these data. 这可能不是表示这些数据的正确方法。 Judging from the ordering of the variable
survival_time
, these are the cohort times at which the covariate changes. 从变量
survival_time
的排序来看,这些是协变量变化的群组时间。 You need a lagged event time to indicate the "start" of observation, set to 0 for the first patient record. 您需要一个滞后的事件时间来指示观察的“开始”,对于第一个患者记录设置为0。 The way you have format the data now have squared the denominator time, reduced the incidence, and attenuated the hazard ratios toward the null.
现在,您对数据进行格式化的方式已经使分母时间平方,降低了发生率,并将风险比减弱到零。
Take the first participant: they are in fact observed from 0 to 7. The first record is 0 to 3, the next: 3 to 4, the last 4 to 7. Where have you told R this explicitly? 拿第一个参与者:事实上他们从0到7被观察。第一个记录是0到3,下一个:3到4,最后4到7.你在哪里明确告诉R? R does not know these records belong to the same individual.
R不知道这些记录属于同一个人。 R now believes there are 3 people followed for a cumulative of 3 + 4 + 7 = 14 years having 1 event rather than 7 years having 1 event (incidence goes from 14 ppy to 7 ppy).
R现在认为有3人随后累计3 + 4 + 7 = 14年有1个事件而不是7年有1个事件(发病率从14 ppy到7 ppy)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.