繁体   English   中英

左连接的 R 代码,使用匹配的日期和不匹配的其他日期

[英]The R code for left join, working with Dates that are matching and others not matching

dfy<-tibble(ttc= c("830592962A","701134213K","620001491E","500542890M","400259766M","800136692H","701229741E"),
            CaseDate1=c("01/04/2019","01/04/2019","02/04/2019","02/04/2019","02/04/2019","02/04/2019","03/04/2019"),
            Theatre=c("RIE_TH_06","RIE_TH_06","RIE_TH_08","RIE_TH_08","RIE_TH_06","RIE_TH_06","RIE_TH_08"))

dss<-tibble(ttc=c("400259766M","800136692H","701229741E","830592962A","701134213K","620001491E","500542890M"),
            D1=c("NA","01/04/2019","NA","01/04/2019","01/04/2019","02/04/2019","NA"),
            D2=c("02/04/2019","NA","NA","NA","NA","NA","02/04/2019"),
            D3=c("NA","NA","04/04/2019","NA","NA","NA","NA"),
            C5=c("APPLE","ORANGE","PINE","MANGO","CHERRY","SUGAR","GREEN"))
  1. 首先,我想根据完全匹配的文件离开联合文件
dfy(ttc&CaseDate1)

dss(ttc& coalesce(D1,D2,D3))
  1. 其次,在没有我想使用的完全匹配的情况下(在dss(ttc& coalesce(D1,D2,D3))

  2. dfy( 701229741E& 03/04/2019)将在后一天或前一天进入dss(701229741E&04/04/201)

我使用了以下代码并且只加入了匹配的 ttc& 日期

dfy %>% 
  left_join(dss %>% mutate(x = coalesce(D1, D2, D3)), by = c("ttc", "CaseDate1" = "x")) %>% 
  select(ttc, CaseDate1, Theatre, C5)

Coalesce 没有按预期工作,因为在数据中“NA”是一个字符串,而不是缺失的数据。 我用

for (c in c('D1', 'D2', 'D3')) {
  dss[c][dss[c] == 'NA'] = NA
}

现在您的相同代码返回

# A tibble: 7 x 4
  ttc        CaseDate1  Theatre   C5    
  <chr>      <chr>      <chr>     <chr> 
1 830592962A 01/04/2019 RIE_TH_06 MANGO 
2 701134213K 01/04/2019 RIE_TH_06 CHERRY
3 620001491E 02/04/2019 RIE_TH_08 SUGAR 
4 500542890M 02/04/2019 RIE_TH_08 GREEN 
5 400259766M 02/04/2019 RIE_TH_06 APPLE 
6 800136692H 02/04/2019 RIE_TH_06 NA    
7 701229741E 03/04/2019 RIE_TH_08 NA   

对于缺少的日期,我的建议是使用full_join而不是left_join ,并在分组 dataframe 中使用fill function:

dfy %>% 
  full_join(dss %>% mutate(x = coalesce(D1, D2, D3)), by = c("ttc", "CaseDate1" = "x")) %>% 
  select(ttc, CaseDate1, Theatre, C5) %>%
  group_by(ttc) %>%
  arrange(desc(CaseDate1)) %>%
  fill(C5) %>%
  filter(!is.na(Theatre)) %>%
  ungroup() %>%
  arrange(CaseDate1)

输出

# A tibble: 7 x 4
  ttc        CaseDate1  Theatre   C5    
  <chr>      <chr>      <chr>     <chr> 
1 830592962A 01/04/2019 RIE_TH_06 MANGO 
2 701134213K 01/04/2019 RIE_TH_06 CHERRY
3 620001491E 02/04/2019 RIE_TH_08 SUGAR 
4 500542890M 02/04/2019 RIE_TH_08 GREEN 
5 400259766M 02/04/2019 RIE_TH_06 APPLE 
6 800136692H 02/04/2019 RIE_TH_06 NA    
7 701229741E 03/04/2019 RIE_TH_08 PINE  

filter(.is.na(Theatre))在这里丢弃任何不在dfy (“左”数据框)中的内容。

如果要填充两个方向,可以在fill function 中添加.direction参数。

dfy %>% 
  full_join(dss %>% mutate(x = coalesce(D1, D2, D3)), by = c("ttc", "CaseDate1" = "x")) %>% 
  select(ttc, CaseDate1, Theatre, C5) %>%
  group_by(ttc) %>%
  arrange(desc(CaseDate1)) %>%
  fill(C5, .direction='updown') %>%
  filter(!is.na(Theatre)) %>%
  ungroup() %>%
  arrange(CaseDate1)

和输出

# A tibble: 7 x 4
  ttc        CaseDate1  Theatre   C5    
  <chr>      <chr>      <chr>     <chr> 
1 830592962A 01/04/2019 RIE_TH_06 MANGO 
2 701134213K 01/04/2019 RIE_TH_06 CHERRY
3 620001491E 02/04/2019 RIE_TH_08 SUGAR 
4 500542890M 02/04/2019 RIE_TH_08 GREEN 
5 400259766M 02/04/2019 RIE_TH_06 APPLE 
6 800136692H 02/04/2019 RIE_TH_06 ORANGE
7 701229741E 03/04/2019 RIE_TH_08 PINE  

我不清楚这是您想要的 output,但我希望它可以帮助您朝着正确的方向前进。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM