[英]Adding time varying covariates to survival data using 'tmerge' in 'survival' package
[英]tmerge function in R for time dependent covariates
我有 tibbles df1 和 df2,我想從使用 dplyr 操作的那些創建 df_temp。 該應用程序用於在延遲進入且 start_time 為年齡的生存 model 中實現時變協變量。 有沒有人有使用 dplyr 或 tmerge 的解決方案?
library(dplyr)
library(magrittr)
library(survival)
df1 =
tibble(id = c(1,2,3),
start_time = c(5,10,15),
stop_time = c(8,17,25),
event = c(1,1,0))
df2 = tibble(
id = c(1,2,3),
stop_time_cancer = c(6, NA, 20),
cancer_status = c(1,0,1))
df_temp <- tibble(
id = c(1,1,2,3,3),
start_time = c(5,6,10,15,20),
stop_time = c(6,8,17,20,25),
cancer_event = c(0, 1, 0, 0, 1),
event = c(0,1, 1, 0, 0)
)
謝謝!
我嘗試使用 tmerge function 來完成它,但由於我延遲了輸入,所以我無法讓它工作。
這目前將fuzzyjoin
用於非等值連接機制(根據我對問題集的解釋是必需的)。 當 dplyr-1.1.0 發布時,這很可能通過其join_by
功能來完成(參考: https://www.tidyverse.org/blog/2022/11/dplyr-1-1-0-is-coming-soon /#加入改進)。
# library(fuzzyjoin)
out <- fuzzyjoin::fuzzy_left_join(
df1, df2,
by = c(id="id", start_time="stop_time_cancer", stop_time="stop_time_cancer"),
match_fun = list(`==`, `<=`, `>=`)
) %>%
rowwise() %>%
summarize(
id = id.x,
start_time = c(start_time, na.omit(stop_time_cancer)),
stop_time = sort(c(na.omit(stop_time_cancer), stop_time)),
event = c(if (!is.na(stop_time_cancer)) 0, event),
cancer_event = c(0, if (!is.na(stop_time_cancer)) 1)
)
out
# # A tibble: 5 × 5
# id start_time stop_time event cancer_event
# <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1 5 6 0 0
# 2 1 6 8 1 1
# 3 2 10 17 1 0
# 4 3 15 20 0 0
# 5 3 20 25 0 1
確認:
all.equal(df_temp, out[,names(df_temp)])
# [1] TRUE
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.