简体   繁体   English

查找日期介于 2 个其他日期之间的观测值,然后添加一行来计算这些观测值。 - R

[英]Find observations where a date falls between 2 other dates, then add a row counting these observations. - R

Apologies for the vague question title, I was struggling with how to ask.为模糊的问题标题道歉,我正在为如何提问而苦苦挣扎。

I have something like this:我有这样的事情:

df_A

patient_number     visit_date
1                   2/2/2003
1                   5/4/2002
2                   4/15/1999
2                   4/30/1996


and then another data frame然后是另一个数据框

df_B

patient_number     medication     start_date     end_date
1                    M             1/2/2002         NA
1                    N             3/7/1999        12/16/2000
1                    O             4/3/2002        7/12/2004
2                    N             5/8/1992        11/4/1997

I want to add a row to the first data frame that counts the number of active medications that the person was on at the time of the visit, like this:我想在第一个数据框中添加一行,用于计算该人在就诊时服用的有效药物的数量,如下所示:

data_A

patient_number     visit_date      number_meds
1                   2/2/2003            2
1                   5/4/2002            2
2                   4/15/1999           0
2                   4/30/1996           1

I think I need to filter the medications that have a corresponding visit date within the medication start and end date, and then count the rows.我想我需要过滤在药物开始和结束日期内具有相应就诊日期的药物,然后计算行数。 I just can't seem to make it work.我似乎无法让它工作。 Any help would be greatly appreciated!任何帮助将不胜感激!

First, make sure your dates are Date .首先,确保您的日期是Date If there is a missing end_date this can be indicated as Inf for use with a join (assumes patient still taking the medication ongoing).如果缺少end_date ,则可以将其指示为Inf以用于连接(假设患者仍在继续服药)。

df_A$visit_date <- as.Date(df_A$visit_date, "%m/%d/%Y")
df_B$start_date <- as.Date(df_B$start_date, "%m/%d/%Y")
df_B$end_date <- as.Date(df_B$end_date, "%m/%d/%Y")
df_B$end_date[is.na(df_B$end_date)] <- as.Date(Inf)

Using fuzzyjoin package, you can join both data frames on patient_number and include medications that have a visit_date that falls between start_date and end_date .使用fuzzyjoin package,您可以连接visit_date patient_number start_dateend_date之间的药物。 Using group_by and summarise , total up the unique medications for each patient and visit date combination.使用group_bysummarise ,汇总每个患者的独特药物和访问日期组合。

library(tidyverse)
library(fuzzyjoin)

df_A %>%
  fuzzy_left_join(
    df_B,
    by = c("patient_number", "visit_date" = "start_date", "visit_date" = "end_date"),
    match_fun = c(`==`, `>=`, `<=`)
  ) %>%
  group_by(patient_number.x, visit_date) %>%
  summarise(number_meds = n_distinct(medication, na.rm = TRUE))

Edit : As an alternative, you can use sqldf package.编辑:作为替代方案,您可以使用sqldf package。 In this case, your SQL statement will include a left join on patient_number .在这种情况下,您的 SQL 语句将包括一个左连接patient_number

library(sqldf)

result <- sqldf("select a.*, b.medication, b.start_date, b.end_date
                 from df_A a left join df_B b
                 on a.patient_number = b.patient_number and
                   a.visit_date between b.start_date and b.end_date")

result %>%
  group_by(patient_number, visit_date) %>%
  summarise(number_meds = n_distinct(medication, na.rm = TRUE))

Output Output

  patient_number.x visit_date number_meds
             <int> <date>           <int>
1                1 2002-05-04           2
2                1 2003-02-02           2
3                2 1996-04-30           1
4                2 1999-04-15           0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM