简体   繁体   中英

Select the most recent date after the reference date from two dataframes in R

I'm using R and I have two data sets, one contains the reference date (date of cancer diagnosis) and another contains the dates of the scans. Some patients have had multiple scans pre and post date of diagnosis. I need to get the first scan after the date of diagnosis. I then plan to merge the data sets so that we can analyse the additional data (not described) that is in the data frames.

I am using lubridate, tidyverse, and dplyr.

The structure of the first data set "a1" is:

patient_id      diagnosis_date
1               2018-06-26
2               2014-10-15
3               2016-02-19
4               2018-06-30

Structure of second data "a2" set:

patient_id      mri_date
1               2018-04-19
1               2018-07-12
1               2018-08-11
2               2014-11-01
3               2016-02-25
3               2018-10-07

I want to select the first scan after the date of diagnosis mri_date>=diagnosis_date for each patient_id. Eg mri_date 2018-07-12 for patient 1.

I've tried merging the data sets combined<-merge(a1,a2,by="patient_id",all.x=TRUE) and then was planning to filter and slice. However, this deleted the multiple mri_date values for each patient and just took the first one.

I've tried searching for an answer but can't seem to find one.

I would be very grateful for your help.

One way with dplyr would be to join a1 and a2 by "patient_id" , arrange them based on mri_date and select the first row where mri_date is greater than diagnosis_date .

library(dplyr)

inner_join(a1, a2, by = 'patient_id') %>%
  arrange(patient_id, mri_date) %>%
  group_by(patient_id) %>%
  slice(which.max(mri_date > diagnosis_date))

#  patient_id diagnosis_date mri_date  
#       <int> <date>         <date>    
#1          1 2018-06-26     2018-07-12
#2          2 2014-10-15     2014-11-01
#3          3 2016-02-19     2016-02-25

data

a1 <- structure(list(patient_id = 1:4, diagnosis_date = structure(c(17708, 
16358, 16850,17712), class = "Date")), row.names = c(NA, -4L), class = "data.frame")

a2 <- structure(list(patient_id = c(1L, 1L, 1L, 2L, 3L, 3L), mri_date = 
structure(c(17640, 17724, 17754, 16375, 16856, 17811), class = "Date")), 
row.names = c(NA,-6L), class = "data.frame")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM