如何在R中互相找到特定时间范围内的观测值

Question

I have a dataset with ID, date, days of life, and medication variables. 我有一个ID，日期，生命天数和用药变量的数据集。 Each ID has multiple observations indicating different administrations of a certain drug. 每个ID都有多个观察值，指示某种药物的不同给药方式。 I want to find UNIQUE meds that were administered within 365 days of each other. 我想查找在彼此之间365天内进行管理的UNIQUE药品。 A sample of the data frame is as follows: 数据帧的示例如下：

ID    date          dayoflife    meds
1     2003-11-24    16361        lasiks
1     2003-11-24    16361        vigab
1     2004-01-09    16407        lacos
1     2013-11-25    20015        pheno
1     2013-11-26    20016        vigab
1     2013-11-26    20016        lasiks
2     2008-06-05    24133        pheno
2     2008-04-07    24074        vigab
3     2014-11-25    8458         pheno
3     2014-12-22    8485         pheno

I expect the outcome to be: 我希望结果是：

indicating that individual 1 had a max of 3 different types of medications administered within 365 days of each other. 表示个人1在彼此之间365天内最多服用3种不同类型的药物。 I am not sure if it is best to use days of life or the date to get to this expected outcome.Any help is appreciated 我不确定是否最好使用生活的日子或日期来达到预期的结果。

Answer 1

An option would be to convert the 'date' to Date class, grouped by 'ID', get the abs olute diff erence of 'date' and the lag of the column, check whether it is greater than 365, create a grouping index with cumsum , get the number of distinct elements of 'meds' in summarise 一种办法是转换的“日期”到Date类，由“ID”进行分组，让abs olute diff “日期”和的erence lag之列，检查它是否大于365时，创建一组指数cumsum ， summarise得出“药物”的不同元素的数量

library(dplyr)
df1 %>% 
   mutate(date = as.Date(date)) %>%
   group_by(ID) %>% 
   mutate(diffd = abs(as.numeric(difftime(date, lag(date, default = first(date)),
               units = 'days')))) %>%
   group_by(grp = cumsum(diffd > 365), add = TRUE) %>%
   summarise(N = n_distinct(meds)) %>%
   group_by(ID) %>%
   summarise(N = max(N))
# A tibble: 3 x 2
#     ID     N
#  <int> <int>
#1     1     2
#2     2     2
#3     3     1

Answer 2

You can try: 你可以试试：

library(dplyr)

df %>%
  group_by(ID) %>%
  mutate(date = as.Date(date),
         lag_date = abs(date - lag(date)) <= 365,
         lead_date = abs(date - lead(date)) <= 365) %>%
  mutate_at(vars(lag_date, lead_date), ~ ifelse(., ., NA)) %>%
  filter(coalesce(lag_date, lead_date)) %>%
  summarise(N = n_distinct(meds))

Output: 输出：

# A tibble: 3 x 2
     ID     N
  <int> <int>
1     1     2
2     2     2
3     3     1

如何在R中互相找到特定时间范围内的观测值

问题描述

2 个解决方案

解决方案1
1 2019-07-31 14:36:50

解决方案2
1 2019-07-31 14:37:50

如何在R中互相找到特定时间范围内的观测值

问题描述

2 个解决方案

解决方案1 1 2019-07-31 14:36:50

解决方案2 1 2019-07-31 14:37:50

解决方案1
1 2019-07-31 14:36:50

解决方案2
1 2019-07-31 14:37:50