简体   繁体   中英

Get the nearest date in a dataframe of R and fetch the record

I have a data frame which looks as follows:

id    OrderDate_1    OrderDate_2    OrderDate_3    NewEnrollDate
1     05/01/2018     01/02/2019     NA             02/15/2019
2     03/02/2019     NA             NA             05/05/2019
3     12/15/2017     12/12/2018     05/01/2019     06/01/2019

I want a logic that goes through each record of data frame and flags the record which is true for following logic

NewEnrollDate >= OrderDate_X and OrderDate_X is nearest to NewEnrollDate

it should also return me the OrderDate_X which passed through the logic above and give me a following table

id    OrderDate_1    OrderDate_2    OrderDate_3    NewEnrollDate   MatchDT
1     05/01/2018     01/02/2019     NA             02/15/2019      01/02/2019
2     03/02/2019     NA             NA             05/05/2019      03/02/2019
3     12/15/2017     12/12/2018     05/01/2019     06/01/2019      05/01/2019

Also, if it has an additional column to flag the records where the records passed the logic of NewEnrollDate >= OrderDate_X

I have tried to use difference between the dates and get min of them but it does not seem to work with NA values to well and it also does not return me the MatchDT variable. Please help.

I managed to do this by using {data.table}.

I have read your concerns about having multiple columns (more than 3) of order dates. In this case, I have used some sort of matching to capture all the columns with the pattern of "OrderDate".

For each of those column, I created a new column having the order date if it is less than or equal to NewEnrollDate, and NA otherwise.

From these new columns, I then proceed to get their maximum, with the parameter na.rm = T, to handle missing values.

library(data.table)

DT <-
  data.table(id = c(1:3),
             OrderDate_1 = as.POSIXct("2018-05-01", "2019-03-02", "2017-12-15"),
             OrderDate_2 = as.POSIXct("2019-01-02", NA, "2018-12-12"),
             OrderDate_3 = as.POSIXct(NA, NA, "2019-05-01"),
             NewEnrollDate = as.POSIXct("2019-02-15", "2019-05-05", "2019-06-01"))

OldNames <- names(DT)[grepl("OrderDate", names(DT))]
NewNames <- paste0(OldNames, "New")

for(i in 1:length(OldNames)){

  setnames(DT, OldNames[i], "PlaceHolder1")
  DT[NewEnrollDate >= PlaceHolder1, PlaceHolder2 := PlaceHolder1]
  setnames(DT, "PlaceHolder1", OldNames[i])
  setnames(DT, "PlaceHolder2", NewNames[i])

}

DT[, MatchDT := pmax(OrderDate_1New, OrderDate_2New, OrderDate_3New, na.rm = T)]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM