[英]Find observations where a date falls between 2 other dates, then add a row counting these observations. - R
Apologies for the vague question title, I was struggling with how to ask.为模糊的问题标题道歉,我正在为如何提问而苦苦挣扎。
I have something like this:我有这样的事情:
df_A
patient_number visit_date
1 2/2/2003
1 5/4/2002
2 4/15/1999
2 4/30/1996
and then another data frame然后是另一个数据框
df_B
patient_number medication start_date end_date
1 M 1/2/2002 NA
1 N 3/7/1999 12/16/2000
1 O 4/3/2002 7/12/2004
2 N 5/8/1992 11/4/1997
I want to add a row to the first data frame that counts the number of active medications that the person was on at the time of the visit, like this:我想在第一个数据框中添加一行,用于计算该人在就诊时服用的有效药物的数量,如下所示:
data_A
patient_number visit_date number_meds
1 2/2/2003 2
1 5/4/2002 2
2 4/15/1999 0
2 4/30/1996 1
I think I need to filter the medications that have a corresponding visit date within the medication start and end date, and then count the rows.我想我需要过滤在药物开始和结束日期内具有相应就诊日期的药物,然后计算行数。 I just can't seem to make it work.我似乎无法让它工作。 Any help would be greatly appreciated!任何帮助将不胜感激!
First, make sure your dates are Date
.首先,确保您的日期是Date
。 If there is a missing end_date
this can be indicated as Inf
for use with a join (assumes patient still taking the medication ongoing).如果缺少end_date
,则可以将其指示为Inf
以用于连接(假设患者仍在继续服药)。
df_A$visit_date <- as.Date(df_A$visit_date, "%m/%d/%Y")
df_B$start_date <- as.Date(df_B$start_date, "%m/%d/%Y")
df_B$end_date <- as.Date(df_B$end_date, "%m/%d/%Y")
df_B$end_date[is.na(df_B$end_date)] <- as.Date(Inf)
Using fuzzyjoin
package, you can join both data frames on patient_number
and include medications that have a visit_date
that falls between start_date
and end_date
.使用fuzzyjoin
package,您可以连接visit_date
patient_number
start_date
和end_date
之间的药物。 Using group_by
and summarise
, total up the unique medications for each patient and visit date combination.使用group_by
和summarise
,汇总每个患者的独特药物和访问日期组合。
library(tidyverse)
library(fuzzyjoin)
df_A %>%
fuzzy_left_join(
df_B,
by = c("patient_number", "visit_date" = "start_date", "visit_date" = "end_date"),
match_fun = c(`==`, `>=`, `<=`)
) %>%
group_by(patient_number.x, visit_date) %>%
summarise(number_meds = n_distinct(medication, na.rm = TRUE))
Edit : As an alternative, you can use sqldf
package.编辑:作为替代方案,您可以使用sqldf
package。 In this case, your SQL statement will include a left join on patient_number
.在这种情况下,您的 SQL 语句将包括一个左连接patient_number
。
library(sqldf)
result <- sqldf("select a.*, b.medication, b.start_date, b.end_date
from df_A a left join df_B b
on a.patient_number = b.patient_number and
a.visit_date between b.start_date and b.end_date")
result %>%
group_by(patient_number, visit_date) %>%
summarise(number_meds = n_distinct(medication, na.rm = TRUE))
Output Output
patient_number.x visit_date number_meds
<int> <date> <int>
1 1 2002-05-04 2
2 1 2003-02-02 2
3 2 1996-04-30 1
4 2 1999-04-15 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.