简体   繁体   English

如何在R中将间隔截断的开始停止时间将数据转换为计数处理格式?

[英]How to convert my data to counting process format with start stop times for interval truncation in R?

I would like to model a recurrent event with subjects that move in and out of risk over the course of the observation period of the study. 我想对一个周期性事件进行建模,使受试者在研究的观察期内进入和退出风险。

I have data on the out-of-risk periods (start and end dates) where the subject cannot experience the event. 我有关于受试者无法体验事件的无风险时期(开始和结束日期)的数据。

I would appreciate any help on how to convert my data to this counting process format with start stop times that reflect both event occurrence and interval truncation in R. I can convert the data to counting process format with event occurrence, but do not know how to partition my start stop times to reflect unobserved periods (other than manually creating the data set which I would very much like to avoid). 对于如何将数据转换为具有开始停止时间的计数过程格式的任何帮助,我都将很高兴,它可以反映R中的事件发生和间隔截断。我可以将数据转换为具有事件发生的计数过程格式,但不知道如何划分我的开始停止时间以反映未观察到的时间段(除了手动创建我非常想避免的数据集之外)。

This is a very simplified example on my input data structure in wide format: 这是我的宽格式输入数据结构的非常简化的示例:

View Input Data Structure 查看输入数据结构

This is what I want to achieve: 这是我要实现的目标:

id t0 t1 outcome
 1  0 36       0
 2  0  5       1
 2  5  15      1
 2 15  36      0
 3  0   9      0
 3 11  20      1
 3 20  36      0

In my illustration, Subject 1 never experiences the event at get right-censored at 36 months. 在我的示例中,主题1从来没有经历过36个月的右删失事件。 Subject 2 experiences the event twice and stays in the risk period throughout the observation period. 受试者2经历两次该事件,并在整个观察期内停留在风险期。 Subject 3 experiences the event once and exits the risk period at 9 months and re-enters the risk period at 11 months. 主题3经历一次该事件,并在9个月退出风险期,然后在11个月重新进入风险期。

Other useful info about my study: 有关我的研究的其他有用信息:

  1. Subjects have a common start time of 0 months. 受试者的一般开始时间为0个月。
  2. Subjects are right-censored at 36 months if no event is experienced. 如果没有发生任何事件,则在36个月时对受试者进行右删失。
  3. Subjects are observed for 3 years. 观察对象3年。
  4. Subjects can move in and out of risk for varying amounts of time and frequency during the 3 year observation period. 在3年的观察期内,对象可能会因时间和频率的变化而进入和退出风险。

Thank you! 谢谢!

I may be missing some corner cases, and there's probably a more elegant solution, but this appears to work. 我可能会错过一些极端情况,并且可能有一个更优雅的解决方案,但这似乎可行。

I suggest running the first two lines of the main logic, then the first three, four, etc. and inspect the output at each stage to build up an understanding of what each step is doing. 我建议运行主逻辑的前两行,然后运行前三,四行,等等,并在每个阶段检查输出,以加深对每个步骤在做什么的理解。

library(tidyr)
library(dplyr)

subjects <- data.frame(
  id = 1:3,
  event = c(0, 1, 1),
  time_to_event_1 = c(NA, 5, 20),
  time_to_event_2 = c(NA, 15, NA),
  time_to_risk_out_start_1 = c(NA, NA, 9),
  time_to_risk_out_end_1 = c(NA, NA, 11),
  time_to_risk_out_start_2 = NA,
  time_to_risk_out_end_2 = NA
)

subjects %>%
  mutate(start = 0,
         end = 36) %>%
  select(-event) %>%
  gather(event, t0, -id) %>%
  group_by(id) %>%
  arrange(id, t0) %>%
  filter(!is.na(t0)) %>%
  mutate(t1 = lead(t0)) %>%
  filter(!is.na(t1),
         !grepl("time_to_risk_out_start", event)) %>%
  mutate(outcome = lead(grepl("time_to_event", event), default = 0)) %>%
  select(id, t0, t1, outcome) %>%
  ungroup()

Also for future reference it's better to share your data using dput(subjects) to make it easier for people to assist - in this case it was pretty easy to reproduce :) 另外,为了将来参考,最好使用dput(subjects)共享您的数据,以便人们更轻松地进行协助-在这种情况下,重现非常容易:)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM