简体   繁体   English

如何在R中互相找到特定时间范围内的观测值

[英]How to find observations within a certain time range of each other in R

I have a dataset with ID, date, days of life, and medication variables. 我有一个ID,日期,生命天数和用药变量的数据集。 Each ID has multiple observations indicating different administrations of a certain drug. 每个ID都有多个观察值,指示某种药物的不同给药方式。 I want to find UNIQUE meds that were administered within 365 days of each other. 我想查找在彼此之间365天内进行管理的UNIQUE药品。 A sample of the data frame is as follows: 数据帧的示例如下:

ID    date          dayoflife    meds
1     2003-11-24    16361        lasiks
1     2003-11-24    16361        vigab
1     2004-01-09    16407        lacos
1     2013-11-25    20015        pheno
1     2013-11-26    20016        vigab
1     2013-11-26    20016        lasiks
2     2008-06-05    24133        pheno
2     2008-04-07    24074        vigab
3     2014-11-25    8458         pheno
3     2014-12-22    8485         pheno

I expect the outcome to be: 我希望结果是:

ID    N
1     3
2     2
3     1

indicating that individual 1 had a max of 3 different types of medications administered within 365 days of each other. 表示个人1在彼此之间365天内最多服用3种不同类型的药物。 I am not sure if it is best to use days of life or the date to get to this expected outcome.Any help is appreciated 我不确定是否最好使用生活的日子或日期来达到预期的结果。

An option would be to convert the 'date' to Date class, grouped by 'ID', get the abs olute diff erence of 'date' and the lag of the column, check whether it is greater than 365, create a grouping index with cumsum , get the number of distinct elements of 'meds' in summarise 一种办法是转换的“日期”到Date类,由“ID”进行分组,让abs olute diff “日期”和的erence lag之列,检查它是否大于365时,创建一组指数cumsumsummarise得出“药物”的不同元素的数量

library(dplyr)
df1 %>% 
   mutate(date = as.Date(date)) %>%
   group_by(ID) %>% 
   mutate(diffd = abs(as.numeric(difftime(date, lag(date, default = first(date)),
               units = 'days')))) %>%
   group_by(grp = cumsum(diffd > 365), add = TRUE) %>%
   summarise(N = n_distinct(meds)) %>%
   group_by(ID) %>%
   summarise(N = max(N))
# A tibble: 3 x 2
#     ID     N
#  <int> <int>
#1     1     2
#2     2     2
#3     3     1

You can try: 你可以试试:

library(dplyr)

df %>%
  group_by(ID) %>%
  mutate(date = as.Date(date),
         lag_date = abs(date - lag(date)) <= 365,
         lead_date = abs(date - lead(date)) <= 365) %>%
  mutate_at(vars(lag_date, lead_date), ~ ifelse(., ., NA)) %>%
  filter(coalesce(lag_date, lead_date)) %>%
  summarise(N = n_distinct(meds))

Output: 输出:

# A tibble: 3 x 2
     ID     N
  <int> <int>
1     1     2
2     2     2
3     3     1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何匹配 R 中彼此相差 +/- 5 的观察值? - How to match observations that are within +/- 5 of each other in R? 如何在 r 或 python 中找到并突出显示彼此一定范围内的值簇? - How can I find and highlight clusters of values within a certain range of each other in r or python? 如果在特定时间范围内,如何在 R 中赋值? - How do I assign a value in R if within a certain range of time? R:如何在 data.table 中标记特定时间范围内的观察? - R: How to flag observations within a certain timeframe in data.table? 如何计算r中每个变量中某些观测值的比例? - how to calculate the proportion of certain observations in each variable in r? 根据R中的每一行统计一定时间范围内的行数(tidyverse) - Count the number of rows within a certain time range based on each row in R (tidyverse) R:跨多列在彼此的范围内查找数据框中的行 - R: find rows in data frame within range of each other across multiple columns 如果行在特定时间内按 R 中的组值发生,则删除行 - Removing rows if they occur within a certain time of each other by a group value in R 计数日期范围内的观测值 R - Counting Observations Within Date Range R 如何查找时间序列中的缺失观测值并填充NA - How to FIND missing observations within a time series and fill with NAs
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM