使用 dplyr 匹配两个日期字段

Question

So I have a dataframe as such:所以我有一个 dataframe 这样的：

ID   key  date1
001   02  2018-02-16
001   02  2018-02-19
001   03  2018-02-17
001   03  2018-02-22
001   04  2017-01-01
002   11  2019-12-21
002   12  2019-12-21
002   13  2019-12-22

And another dataframe (DF2)还有另一个 dataframe (DF2)

ID   key  date2
001   02  2018-02-20
001   03  2018-03-22
002   13  2019-12-22
002   13  2019-12-21

So the task is conceptually straightforward:所以这个任务在概念上很简单：

I would like to find the previous date that exists in DF1 to each date in DF2.我想找到 DF1 中存在的上一个日期到 DF2 中的每个日期。

What do I mean by that?我的意思是什么？ In DF2 for example, we see a date of 2018-02-20 as well as a corresponding Key and ID.例如，在 DF2 中，我们看到日期为2018-02-20以及相应的 Key 和 ID。 So I go to DF1 and find the matching ID and Key, this gives me two possibilities.所以我 go 到 DF1 并找到匹配的 ID 和 Key，这给了我两种可能性。 I need the one that's the previous, so not after.我需要的是前一个，所以不是后一个。 Thus it would be 2018-02-19 .因此它将是2018-02-19 。 I will eventually calcuate the number of days.我最终会计算天数。

The final df should look like this:最终的 df 应如下所示：

ID   key  date2           date1   day_diff
001   02  2018-02-20 2018-02-19          1
001   03  2018-03-22 2018-02-22         28
002   13  2019-12-22 2019-12-22          0
002   13  2019-12-21         NA         NA

Again, we just need the date previous to the one that is in each row of DF2.同样，我们只需要 DF2 每一行中的日期之前的日期。 This also needs to return NA if there is no previous date.如果没有以前的日期，这也需要返回 NA。

Answer 1

Does this work:这是否有效：

library(dplyr)
df1 %>% group_by(ID, key) %>% filter(date1 == max(date1)) %>% 
fuzzyjoin::fuzzy_right_join(df2, by = c('ID' = 'ID', 'key' = 'key', 'date1' = 'date2'), match_fun = list(`==`, `==`, `<=`)) %>% 
ungroup() %>% select('ID' = ID.y, 'key' = key.y, date2, date1) %>% mutate(day_diff = as.numeric(date2 - date1))
# A tibble: 4 x 5
  ID    key   date2      date1      day_diff
  <chr> <chr> <date>     <date>        <dbl>
1 001   02    2018-02-20 2018-02-19        1
2 001   03    2018-03-22 2018-02-22       28
3 002   13    2019-12-22 2019-12-22        0
4 002   13    2019-12-21 NA               NA

使用 dplyr 匹配两个日期字段

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-11-30 17:49:02

使用 dplyr 匹配两个日期字段

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-11-30 17:49:02

解决方案1
1 已采纳 2020-11-30 17:49:02