如何在 tidyverse package 中使用左连接仅匹配两个 dfs 中具有多个不同变量的第一个？

Question

Say I have to 2 dfs说我必须 2 dfs

df_1 <- data_frame(dates = c(as.Date("2018-07-01"), as.Date("2018-06-01"), as.Date("2018-06-01"), as.Date("2018-06-01"), as.Date("2018-05-01")), x1 = c(10L, 11L, 21L, 21L, 13L), text1 = c("text a", "text b", "text c", "text d", "text e"))
df_2 <- data_frame(dates = c(as.Date("2018-07-01"), as.Date("2018-06-01"), as.Date("2018-05-01"), as.Date("2018-04-01")),x2 = c(10L, 21L, 22L, 23L),text2 = c("text aa", "text bb", "text cc", "text dd"))

I know I could use join function in plyr package to match the first with only one variable我知道我可以使用 join function in plyr package 来匹配第一个只有一个变量

plyr::join(df_2, df_1, type = 'left', match = 'first', by = 'dates')

But with 2 variables, "dates" and "x", it will throw error:但是有 2 个变量，“dates”和“x”，它会抛出错误：

plyr::join(df_2, df_1, type = 'left', match = 'first', by = c('dates' = 'dates', 'x2' = 'x1'))

Also I could use left_join in dplyr with multiple variables我也可以在 dplyr 中使用 left_join 和多个变量

df_2 %>% 
  left_join(df_1, by = c('dates' = 'dates', 'x2' = 'x1'))

But no the first match argument.但没有第一个匹配参数。 Any helps, thanks任何帮助，谢谢

Answer 1

You can't, not directly, multiple matching combinations are always returned using join operations with dplyr .您不能，也不能直接使用dplyr的连接操作返回多个匹配组合。 If you want only the first match you could group by the joining variables and use slice() on the second table before the join.如果您只想要第一个匹配项，您可以按连接变量分组，并在连接之前在第二个表上使用slice() 。

df_2 %>% 
  left_join(df_1 %>%
              group_by(dates, x1) %>%
              slice(1), by = c('dates' = 'dates', 'x2' = 'x1'))

# A tibble: 4 x 4
  dates         x2 text2   text1 
  <date>     <int> <chr>   <chr> 
1 2018-07-01    10 text aa text a
2 2018-06-01    21 text bb text c
3 2018-05-01    22 text cc NA    
4 2018-04-01    23 text dd NA

Answer 2

One (rather cheating) solution would be to rename the column that throws an error like so,一种（相当作弊的）解决方案是重命名引发错误的列，如下所示，

colnames(df_1)[2] = "x2"

and then do然后做

plyr::join(df_2, df_1, type = 'left', match = 'first', by = c("dates", "x2"))

which yields产生

       dates x2   text2  text1
1 2018-07-01 10 text aa text a
2 2018-06-01 21 text bb text c
3 2018-05-01 22 text cc   <NA>
4 2018-04-01 23 text dd   <NA>

如何在 tidyverse package 中使用左连接仅匹配两个 dfs 中具有多个不同变量的第一个？

问题描述

2 个解决方案

解决方案1
1 已采纳 2020-04-08 09:55:53

解决方案2
0 2020-04-08 09:54:23

如何在 tidyverse package 中使用左连接仅匹配两个 dfs 中具有多个不同变量的第一个？

问题描述

2 个解决方案

解决方案1 1 已采纳 2020-04-08 09:55:53

解决方案2 0 2020-04-08 09:54:23

解决方案1
1 已采纳 2020-04-08 09:55:53

解决方案2
0 2020-04-08 09:54:23