[英]Matching two date fields using dplyr
So I have a dataframe as such:所以我有一个 dataframe 这样的:
ID key date1
001 02 2018-02-16
001 02 2018-02-19
001 03 2018-02-17
001 03 2018-02-22
001 04 2017-01-01
002 11 2019-12-21
002 12 2019-12-21
002 13 2019-12-22
And another dataframe (DF2)还有另一个 dataframe (DF2)
ID key date2
001 02 2018-02-20
001 03 2018-03-22
002 13 2019-12-22
002 13 2019-12-21
So the task is conceptually straightforward:所以这个任务在概念上很简单:
I would like to find the previous date that exists in DF1 to each date in DF2.我想找到 DF1 中存在的上一个日期到 DF2 中的每个日期。
What do I mean by that?我的意思是什么? In DF2 for example, we see a date of
2018-02-20
as well as a corresponding Key and ID.例如,在 DF2 中,我们看到日期为
2018-02-20
以及相应的 Key 和 ID。 So I go to DF1 and find the matching ID and Key, this gives me two possibilities.所以我 go 到 DF1 并找到匹配的 ID 和 Key,这给了我两种可能性。 I need the one that's the previous, so not after.
我需要的是前一个,所以不是后一个。 Thus it would be
2018-02-19
.因此它将是
2018-02-19
。 I will eventually calcuate the number of days.我最终会计算天数。
The final df should look like this:最终的 df 应如下所示:
ID key date2 date1 day_diff
001 02 2018-02-20 2018-02-19 1
001 03 2018-03-22 2018-02-22 28
002 13 2019-12-22 2019-12-22 0
002 13 2019-12-21 NA NA
Again, we just need the date previous to the one that is in each row of DF2.同样,我们只需要 DF2 每一行中的日期之前的日期。 This also needs to return NA if there is no previous date.
如果没有以前的日期,这也需要返回 NA。
Does this work:这是否有效:
library(dplyr)
df1 %>% group_by(ID, key) %>% filter(date1 == max(date1)) %>%
fuzzyjoin::fuzzy_right_join(df2, by = c('ID' = 'ID', 'key' = 'key', 'date1' = 'date2'), match_fun = list(`==`, `==`, `<=`)) %>%
ungroup() %>% select('ID' = ID.y, 'key' = key.y, date2, date1) %>% mutate(day_diff = as.numeric(date2 - date1))
# A tibble: 4 x 5
ID key date2 date1 day_diff
<chr> <chr> <date> <date> <dbl>
1 001 02 2018-02-20 2018-02-19 1
2 001 03 2018-03-22 2018-02-22 28
3 002 13 2019-12-22 2019-12-22 0
4 002 13 2019-12-21 NA NA
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.