[英]How to find observations within a certain time range of each other in R
I have a dataset with ID, date, days of life, and medication variables. 我有一个ID,日期,生命天数和用药变量的数据集。 Each ID has multiple observations indicating different administrations of a certain drug.
每个ID都有多个观察值,指示某种药物的不同给药方式。 I want to find UNIQUE meds that were administered within 365 days of each other.
我想查找在彼此之间365天内进行管理的UNIQUE药品。 A sample of the data frame is as follows:
数据帧的示例如下:
ID date dayoflife meds
1 2003-11-24 16361 lasiks
1 2003-11-24 16361 vigab
1 2004-01-09 16407 lacos
1 2013-11-25 20015 pheno
1 2013-11-26 20016 vigab
1 2013-11-26 20016 lasiks
2 2008-06-05 24133 pheno
2 2008-04-07 24074 vigab
3 2014-11-25 8458 pheno
3 2014-12-22 8485 pheno
I expect the outcome to be: 我希望结果是:
ID N
1 3
2 2
3 1
indicating that individual 1 had a max of 3 different types of medications administered within 365 days of each other. 表示个人1在彼此之间365天内最多服用3种不同类型的药物。 I am not sure if it is best to use days of life or the date to get to this expected outcome.Any help is appreciated
我不确定是否最好使用生活的日子或日期来达到预期的结果。
An option would be to convert the 'date' to Date
class, grouped by 'ID', get the abs
olute diff
erence of 'date' and the lag
of the column, check whether it is greater than 365, create a grouping index with cumsum
, get the number of distinct elements of 'meds' in summarise
一种办法是转换的“日期”到
Date
类,由“ID”进行分组,让abs
olute diff
“日期”和的erence lag
之列,检查它是否大于365时,创建一组指数cumsum
, summarise
得出“药物”的不同元素的数量
library(dplyr)
df1 %>%
mutate(date = as.Date(date)) %>%
group_by(ID) %>%
mutate(diffd = abs(as.numeric(difftime(date, lag(date, default = first(date)),
units = 'days')))) %>%
group_by(grp = cumsum(diffd > 365), add = TRUE) %>%
summarise(N = n_distinct(meds)) %>%
group_by(ID) %>%
summarise(N = max(N))
# A tibble: 3 x 2
# ID N
# <int> <int>
#1 1 2
#2 2 2
#3 3 1
You can try: 你可以试试:
library(dplyr)
df %>%
group_by(ID) %>%
mutate(date = as.Date(date),
lag_date = abs(date - lag(date)) <= 365,
lead_date = abs(date - lead(date)) <= 365) %>%
mutate_at(vars(lag_date, lead_date), ~ ifelse(., ., NA)) %>%
filter(coalesce(lag_date, lead_date)) %>%
summarise(N = n_distinct(meds))
Output: 输出:
# A tibble: 3 x 2
ID N
<int> <int>
1 1 2
2 2 2
3 3 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.