[英]The R code for left join, working with Dates that are matching and others not matching
dfy<-tibble(ttc= c("830592962A","701134213K","620001491E","500542890M","400259766M","800136692H","701229741E"),
CaseDate1=c("01/04/2019","01/04/2019","02/04/2019","02/04/2019","02/04/2019","02/04/2019","03/04/2019"),
Theatre=c("RIE_TH_06","RIE_TH_06","RIE_TH_08","RIE_TH_08","RIE_TH_06","RIE_TH_06","RIE_TH_08"))
dss<-tibble(ttc=c("400259766M","800136692H","701229741E","830592962A","701134213K","620001491E","500542890M"),
D1=c("NA","01/04/2019","NA","01/04/2019","01/04/2019","02/04/2019","NA"),
D2=c("02/04/2019","NA","NA","NA","NA","NA","02/04/2019"),
D3=c("NA","NA","04/04/2019","NA","NA","NA","NA"),
C5=c("APPLE","ORANGE","PINE","MANGO","CHERRY","SUGAR","GREEN"))
dfy(ttc&CaseDate1)
dss(ttc& coalesce(D1,D2,D3))
其次,在沒有我想使用的完全匹配的情況下(在dss(ttc& coalesce(D1,D2,D3))
dfy( 701229741E& 03/04/2019)
將在后一天或前一天進入dss(701229741E&04/04/201)
我使用了以下代碼並且只加入了匹配的 ttc& 日期
dfy %>%
left_join(dss %>% mutate(x = coalesce(D1, D2, D3)), by = c("ttc", "CaseDate1" = "x")) %>%
select(ttc, CaseDate1, Theatre, C5)
Coalesce 沒有按預期工作,因為在數據中“NA”是一個字符串,而不是缺失的數據。 我用
for (c in c('D1', 'D2', 'D3')) {
dss[c][dss[c] == 'NA'] = NA
}
現在您的相同代碼返回
# A tibble: 7 x 4
ttc CaseDate1 Theatre C5
<chr> <chr> <chr> <chr>
1 830592962A 01/04/2019 RIE_TH_06 MANGO
2 701134213K 01/04/2019 RIE_TH_06 CHERRY
3 620001491E 02/04/2019 RIE_TH_08 SUGAR
4 500542890M 02/04/2019 RIE_TH_08 GREEN
5 400259766M 02/04/2019 RIE_TH_06 APPLE
6 800136692H 02/04/2019 RIE_TH_06 NA
7 701229741E 03/04/2019 RIE_TH_08 NA
對於缺少的日期,我的建議是使用full_join
而不是left_join
,並在分組 dataframe 中使用fill
function:
dfy %>%
full_join(dss %>% mutate(x = coalesce(D1, D2, D3)), by = c("ttc", "CaseDate1" = "x")) %>%
select(ttc, CaseDate1, Theatre, C5) %>%
group_by(ttc) %>%
arrange(desc(CaseDate1)) %>%
fill(C5) %>%
filter(!is.na(Theatre)) %>%
ungroup() %>%
arrange(CaseDate1)
輸出
# A tibble: 7 x 4
ttc CaseDate1 Theatre C5
<chr> <chr> <chr> <chr>
1 830592962A 01/04/2019 RIE_TH_06 MANGO
2 701134213K 01/04/2019 RIE_TH_06 CHERRY
3 620001491E 02/04/2019 RIE_TH_08 SUGAR
4 500542890M 02/04/2019 RIE_TH_08 GREEN
5 400259766M 02/04/2019 RIE_TH_06 APPLE
6 800136692H 02/04/2019 RIE_TH_06 NA
7 701229741E 03/04/2019 RIE_TH_08 PINE
filter(.is.na(Theatre))
在這里丟棄任何不在dfy
(“左”數據框)中的內容。
如果要填充兩個方向,可以在fill
function 中添加.direction
參數。
dfy %>%
full_join(dss %>% mutate(x = coalesce(D1, D2, D3)), by = c("ttc", "CaseDate1" = "x")) %>%
select(ttc, CaseDate1, Theatre, C5) %>%
group_by(ttc) %>%
arrange(desc(CaseDate1)) %>%
fill(C5, .direction='updown') %>%
filter(!is.na(Theatre)) %>%
ungroup() %>%
arrange(CaseDate1)
和輸出
# A tibble: 7 x 4
ttc CaseDate1 Theatre C5
<chr> <chr> <chr> <chr>
1 830592962A 01/04/2019 RIE_TH_06 MANGO
2 701134213K 01/04/2019 RIE_TH_06 CHERRY
3 620001491E 02/04/2019 RIE_TH_08 SUGAR
4 500542890M 02/04/2019 RIE_TH_08 GREEN
5 400259766M 02/04/2019 RIE_TH_06 APPLE
6 800136692H 02/04/2019 RIE_TH_06 ORANGE
7 701229741E 03/04/2019 RIE_TH_08 PINE
我不清楚這是您想要的 output,但我希望它可以幫助您朝着正確的方向前進。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.