dfy<-tibble(ttc= c("830592962A","701134213K","620001491E","500542890M","400259766M","800136692H","701229741E"),
CaseDate1=c("01/04/2019","01/04/2019","02/04/2019","02/04/2019","02/04/2019","02/04/2019","03/04/2019"),
Theatre=c("RIE_TH_06","RIE_TH_06","RIE_TH_08","RIE_TH_08","RIE_TH_06","RIE_TH_06","RIE_TH_08"))
dss<-tibble(ttc=c("400259766M","800136692H","701229741E","830592962A","701134213K","620001491E","500542890M"),
D1=c("NA","01/04/2019","NA","01/04/2019","01/04/2019","02/04/2019","NA"),
D2=c("02/04/2019","NA","NA","NA","NA","NA","02/04/2019"),
D3=c("NA","NA","04/04/2019","NA","NA","NA","NA"),
C5=c("APPLE","ORANGE","PINE","MANGO","CHERRY","SUGAR","GREEN"))
dfy(ttc&CaseDate1)
dss(ttc& coalesce(D1,D2,D3))
Secondly, where there is no exact matches i want to use (a day before or a day after in dss(ttc& coalesce(D1,D2,D3))
dfy( 701229741E& 03/04/2019)
will mathc into dss(701229741E&04/04/201)
a day after or a day before
I have used the following code and has joined only the matching ttc& dates
dfy %>%
left_join(dss %>% mutate(x = coalesce(D1, D2, D3)), by = c("ttc", "CaseDate1" = "x")) %>%
select(ttc, CaseDate1, Theatre, C5)
Coalesce is not working as intended because in the data "NA" is a string, not a missing data. I fixed that with
for (c in c('D1', 'D2', 'D3')) {
dss[c][dss[c] == 'NA'] = NA
}
Now your same code returns
# A tibble: 7 x 4
ttc CaseDate1 Theatre C5
<chr> <chr> <chr> <chr>
1 830592962A 01/04/2019 RIE_TH_06 MANGO
2 701134213K 01/04/2019 RIE_TH_06 CHERRY
3 620001491E 02/04/2019 RIE_TH_08 SUGAR
4 500542890M 02/04/2019 RIE_TH_08 GREEN
5 400259766M 02/04/2019 RIE_TH_06 APPLE
6 800136692H 02/04/2019 RIE_TH_06 NA
7 701229741E 03/04/2019 RIE_TH_08 NA
For the missing date, my suggestion would be use a full_join
instead of left_join
, and use the fill
function in a grouped dataframe:
dfy %>%
full_join(dss %>% mutate(x = coalesce(D1, D2, D3)), by = c("ttc", "CaseDate1" = "x")) %>%
select(ttc, CaseDate1, Theatre, C5) %>%
group_by(ttc) %>%
arrange(desc(CaseDate1)) %>%
fill(C5) %>%
filter(!is.na(Theatre)) %>%
ungroup() %>%
arrange(CaseDate1)
outputs
# A tibble: 7 x 4
ttc CaseDate1 Theatre C5
<chr> <chr> <chr> <chr>
1 830592962A 01/04/2019 RIE_TH_06 MANGO
2 701134213K 01/04/2019 RIE_TH_06 CHERRY
3 620001491E 02/04/2019 RIE_TH_08 SUGAR
4 500542890M 02/04/2019 RIE_TH_08 GREEN
5 400259766M 02/04/2019 RIE_TH_06 APPLE
6 800136692H 02/04/2019 RIE_TH_06 NA
7 701229741E 03/04/2019 RIE_TH_08 PINE
filter(.is.na(Theatre))
here is dropping whatever was not in the dfy
(the "left" dataframe).
If you want to fill in both directions, you can add the .direction
argument to the fill
function.
dfy %>%
full_join(dss %>% mutate(x = coalesce(D1, D2, D3)), by = c("ttc", "CaseDate1" = "x")) %>%
select(ttc, CaseDate1, Theatre, C5) %>%
group_by(ttc) %>%
arrange(desc(CaseDate1)) %>%
fill(C5, .direction='updown') %>%
filter(!is.na(Theatre)) %>%
ungroup() %>%
arrange(CaseDate1)
and outputs
# A tibble: 7 x 4
ttc CaseDate1 Theatre C5
<chr> <chr> <chr> <chr>
1 830592962A 01/04/2019 RIE_TH_06 MANGO
2 701134213K 01/04/2019 RIE_TH_06 CHERRY
3 620001491E 02/04/2019 RIE_TH_08 SUGAR
4 500542890M 02/04/2019 RIE_TH_08 GREEN
5 400259766M 02/04/2019 RIE_TH_06 APPLE
6 800136692H 02/04/2019 RIE_TH_06 ORANGE
7 701229741E 03/04/2019 RIE_TH_08 PINE
It is not clear to me that this is your intended output, but I hope it helps you in the right direction.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.