简体   繁体   English

如何在 tidyverse package 中使用左连接仅匹配两个 dfs 中具有多个不同变量的第一个?

[英]How to use left join in tidyverse package only match the first one with multiple different variables in two dfs?

Say I have to 2 dfs说我必须 2 dfs

df_1 <- data_frame(dates = c(as.Date("2018-07-01"), as.Date("2018-06-01"), as.Date("2018-06-01"), as.Date("2018-06-01"), as.Date("2018-05-01")), x1 = c(10L, 11L, 21L, 21L, 13L), text1 = c("text a", "text b", "text c", "text d", "text e"))
df_2 <- data_frame(dates = c(as.Date("2018-07-01"), as.Date("2018-06-01"), as.Date("2018-05-01"), as.Date("2018-04-01")),x2 = c(10L, 21L, 22L, 23L),text2 = c("text aa", "text bb", "text cc", "text dd"))

I know I could use join function in plyr package to match the first with only one variable我知道我可以使用 join function in plyr package 来匹配第一个只有一个变量

plyr::join(df_2, df_1, type = 'left', match = 'first', by = 'dates')

But with 2 variables, "dates" and "x", it will throw error:但是有 2 个变量,“dates”和“x”,它会抛出错误:

plyr::join(df_2, df_1, type = 'left', match = 'first', by = c('dates' = 'dates', 'x2' = 'x1'))

Also I could use left_join in dplyr with multiple variables我也可以在 dplyr 中使用 left_join 和多个变量

df_2 %>% 
  left_join(df_1, by = c('dates' = 'dates', 'x2' = 'x1'))

But no the first match argument.但没有第一个匹配参数。 Any helps, thanks任何帮助,谢谢

You can't, not directly, multiple matching combinations are always returned using join operations with dplyr .您不能,也不能直接使用dplyr的连接操作返回多个匹配组合。 If you want only the first match you could group by the joining variables and use slice() on the second table before the join.如果您只想要第一个匹配项,您可以按连接变量分组,并在连接之前在第二个表上使用slice()

df_2 %>% 
  left_join(df_1 %>%
              group_by(dates, x1) %>%
              slice(1), by = c('dates' = 'dates', 'x2' = 'x1'))

# A tibble: 4 x 4
  dates         x2 text2   text1 
  <date>     <int> <chr>   <chr> 
1 2018-07-01    10 text aa text a
2 2018-06-01    21 text bb text c
3 2018-05-01    22 text cc NA    
4 2018-04-01    23 text dd NA    

One (rather cheating) solution would be to rename the column that throws an error like so,一种(相当作弊的)解决方案是重命名引发错误的列,如下所示,

colnames(df_1)[2] = "x2" 

and then do然后做

plyr::join(df_2, df_1, type = 'left', match = 'first', by = c("dates", "x2"))

which yields产生

       dates x2   text2  text1
1 2018-07-01 10 text aa text a
2 2018-06-01 21 text bb text c
3 2018-05-01 22 text cc   <NA>
4 2018-04-01 23 text dd   <NA>

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在 R 中以不同的 ID 名称加入多个 DF - How to left join multiple DFs in R with different ID name 如何在 tidyverse 设置中使用 try 和 left join? - How to use try and left join in a tidyverse setting? R dplyr:我如何使用 left_join 只保留第一个匹配项? - R dplyr: how do I use left_join keeping only the first match? 如何使用多个变量在 tidyverse 中使用价差 function? - How to use the spread function in tidyverse using multiple variables? 使用 tidyverse 和 broom 包对多个变量/指标进行多组两个样本 t 检验 - Two sample t-test in multiple groups with multiple variables / metrics using tidyverse and broom package 如何在不同级别的两个变量中使用anti_join? - How to use anti_join with different levels of two variables? 如果 R 中的两个 dfs 中的两个“age”变量匹配,如何将新列添加到作为变量“hr”的 -ln 的数据框中? - How to add a new column to a data frame that is the -ln of the variable “hr” if the two “age” variables in the two dfs match in R? 如何使用多列和一个“可能”列在 R 中使用 left_join 进行匹配 - How to do a match in R with left_join using multiple columns and one “likely” column 如何通过R中的值(不加入)匹配两个dfs? - How to match two dfs by a values (not joining) in R? 如何 left_join() 两个数据集但只从其中一个数据集中选择特定列? - How to left_join() two datasets but only select specific columns from one of the datasets?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM