簡體   English   中英

左連接的 R 代碼,使用匹配的日期和不匹配的其他日期

[英]The R code for left join, working with Dates that are matching and others not matching

dfy<-tibble(ttc= c("830592962A","701134213K","620001491E","500542890M","400259766M","800136692H","701229741E"),
            CaseDate1=c("01/04/2019","01/04/2019","02/04/2019","02/04/2019","02/04/2019","02/04/2019","03/04/2019"),
            Theatre=c("RIE_TH_06","RIE_TH_06","RIE_TH_08","RIE_TH_08","RIE_TH_06","RIE_TH_06","RIE_TH_08"))

dss<-tibble(ttc=c("400259766M","800136692H","701229741E","830592962A","701134213K","620001491E","500542890M"),
            D1=c("NA","01/04/2019","NA","01/04/2019","01/04/2019","02/04/2019","NA"),
            D2=c("02/04/2019","NA","NA","NA","NA","NA","02/04/2019"),
            D3=c("NA","NA","04/04/2019","NA","NA","NA","NA"),
            C5=c("APPLE","ORANGE","PINE","MANGO","CHERRY","SUGAR","GREEN"))
  1. 首先,我想根據完全匹配的文件離開聯合文件
dfy(ttc&CaseDate1)

dss(ttc& coalesce(D1,D2,D3))
  1. 其次,在沒有我想使用的完全匹配的情況下(在dss(ttc& coalesce(D1,D2,D3))

  2. dfy( 701229741E& 03/04/2019)將在后一天或前一天進入dss(701229741E&04/04/201)

我使用了以下代碼並且只加入了匹配的 ttc& 日期

dfy %>% 
  left_join(dss %>% mutate(x = coalesce(D1, D2, D3)), by = c("ttc", "CaseDate1" = "x")) %>% 
  select(ttc, CaseDate1, Theatre, C5)

Coalesce 沒有按預期工作,因為在數據中“NA”是一個字符串,而不是缺失的數據。 我用

for (c in c('D1', 'D2', 'D3')) {
  dss[c][dss[c] == 'NA'] = NA
}

現在您的相同代碼返回

# A tibble: 7 x 4
  ttc        CaseDate1  Theatre   C5    
  <chr>      <chr>      <chr>     <chr> 
1 830592962A 01/04/2019 RIE_TH_06 MANGO 
2 701134213K 01/04/2019 RIE_TH_06 CHERRY
3 620001491E 02/04/2019 RIE_TH_08 SUGAR 
4 500542890M 02/04/2019 RIE_TH_08 GREEN 
5 400259766M 02/04/2019 RIE_TH_06 APPLE 
6 800136692H 02/04/2019 RIE_TH_06 NA    
7 701229741E 03/04/2019 RIE_TH_08 NA   

對於缺少的日期,我的建議是使用full_join而不是left_join ,並在分組 dataframe 中使用fill function:

dfy %>% 
  full_join(dss %>% mutate(x = coalesce(D1, D2, D3)), by = c("ttc", "CaseDate1" = "x")) %>% 
  select(ttc, CaseDate1, Theatre, C5) %>%
  group_by(ttc) %>%
  arrange(desc(CaseDate1)) %>%
  fill(C5) %>%
  filter(!is.na(Theatre)) %>%
  ungroup() %>%
  arrange(CaseDate1)

輸出

# A tibble: 7 x 4
  ttc        CaseDate1  Theatre   C5    
  <chr>      <chr>      <chr>     <chr> 
1 830592962A 01/04/2019 RIE_TH_06 MANGO 
2 701134213K 01/04/2019 RIE_TH_06 CHERRY
3 620001491E 02/04/2019 RIE_TH_08 SUGAR 
4 500542890M 02/04/2019 RIE_TH_08 GREEN 
5 400259766M 02/04/2019 RIE_TH_06 APPLE 
6 800136692H 02/04/2019 RIE_TH_06 NA    
7 701229741E 03/04/2019 RIE_TH_08 PINE  

filter(.is.na(Theatre))在這里丟棄任何不在dfy (“左”數據框)中的內容。

如果要填充兩個方向,可以在fill function 中添加.direction參數。

dfy %>% 
  full_join(dss %>% mutate(x = coalesce(D1, D2, D3)), by = c("ttc", "CaseDate1" = "x")) %>% 
  select(ttc, CaseDate1, Theatre, C5) %>%
  group_by(ttc) %>%
  arrange(desc(CaseDate1)) %>%
  fill(C5, .direction='updown') %>%
  filter(!is.na(Theatre)) %>%
  ungroup() %>%
  arrange(CaseDate1)

和輸出

# A tibble: 7 x 4
  ttc        CaseDate1  Theatre   C5    
  <chr>      <chr>      <chr>     <chr> 
1 830592962A 01/04/2019 RIE_TH_06 MANGO 
2 701134213K 01/04/2019 RIE_TH_06 CHERRY
3 620001491E 02/04/2019 RIE_TH_08 SUGAR 
4 500542890M 02/04/2019 RIE_TH_08 GREEN 
5 400259766M 02/04/2019 RIE_TH_06 APPLE 
6 800136692H 02/04/2019 RIE_TH_06 ORANGE
7 701229741E 03/04/2019 RIE_TH_08 PINE  

我不清楚這是您想要的 output,但我希望它可以幫助您朝着正確的方向前進。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM