[英]Formatting repeated date data in time dependent survival format
referencing the following manual for time dependent survival in R:参考以下手册了解 R 中的时间依赖性生存:
https://cran.r-project.org/web/packages/survival/vignettes/timedep.pdf https://cran.r-project.org/web/packages/survival/vignettes/timedep.pdf
From the vignette example:从小插图示例:
subject time1 time2 death creatinine
5 0 90 0 0.9
5 90 120 0 1.5
5 120 185 1 1.2
The data I have is in the following format:我拥有的数据格式如下:
In dd-mm-yyyy format dd-mm-yyyy 格式
subject date death creatinine
5 01-01-2022 0 0.9
5 01-04-2022 0 1.5
5 01-05-2022 0 1.2
5 05-07-2022 1 1.2
I need to format the data below to match to the data above.我需要格式化下面的数据以匹配上面的数据。
You can't fill in time2
in the last row without more information.如果没有更多信息,您无法在最后一行填写
time2
。 In single-event data, if an individual has the event (like in your example), time2
value in the final row would typically be the time of the event (in the final row).在单事件数据中,如果一个人有事件(如您的示例中),最后一行中的
time2
值通常是事件的时间(在最后一行)。 For those that don't have the event, time2
might be the time the observation for that individual ended.对于那些没有事件的人,
time2
可能是该人的观察结束的时间。
So, excluding the final time2
value per subject
, you can do something like this因此,不包括每个
subject
的最终time2
值,您可以执行以下操作
library(dplyr)
df %>%
# change date to Date using as.Date()
mutate(date=as.Date(date,"%d-%m-%y")) %>%
# arrange the rows by date
arrange(date) %>%
# group by subject
group_by(subject) %>%
# for each subject, create time2 and time1
mutate(
time2 = as.numeric(lead(date)-min(date)-1),
time1 = lag(time2),
time1 = if_else(row_number()==1, 0, time1)
) %>%
ungroup() %>%
# move time1 next to time2
relocate(time1,.before = time2)
Output: Output:
subject date death creatinine time1 time2
<int> <date> <int> <dbl> <dbl> <dbl>
1 5 2020-01-01 0 0.9 0 90
2 5 2020-04-01 0 1.5 90 120
3 5 2020-05-01 1 1.2 120 NA
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.