[英]How to use left join in tidyverse package only match the first one with multiple different variables in two dfs?
Say I have to 2 dfs说我必须 2 dfs
df_1 <- data_frame(dates = c(as.Date("2018-07-01"), as.Date("2018-06-01"), as.Date("2018-06-01"), as.Date("2018-06-01"), as.Date("2018-05-01")), x1 = c(10L, 11L, 21L, 21L, 13L), text1 = c("text a", "text b", "text c", "text d", "text e"))
df_2 <- data_frame(dates = c(as.Date("2018-07-01"), as.Date("2018-06-01"), as.Date("2018-05-01"), as.Date("2018-04-01")),x2 = c(10L, 21L, 22L, 23L),text2 = c("text aa", "text bb", "text cc", "text dd"))
I know I could use join function in plyr package to match the first with only one variable我知道我可以使用 join function in plyr package 来匹配第一个只有一个变量
plyr::join(df_2, df_1, type = 'left', match = 'first', by = 'dates')
But with 2 variables, "dates" and "x", it will throw error:但是有 2 个变量,“dates”和“x”,它会抛出错误:
plyr::join(df_2, df_1, type = 'left', match = 'first', by = c('dates' = 'dates', 'x2' = 'x1'))
Also I could use left_join in dplyr with multiple variables我也可以在 dplyr 中使用 left_join 和多个变量
df_2 %>%
left_join(df_1, by = c('dates' = 'dates', 'x2' = 'x1'))
But no the first match argument.但没有第一个匹配参数。 Any helps, thanks任何帮助,谢谢
You can't, not directly, multiple matching combinations are always returned using join operations with dplyr
.您不能,也不能直接使用dplyr
的连接操作返回多个匹配组合。 If you want only the first match you could group by the joining variables and use slice()
on the second table before the join.如果您只想要第一个匹配项,您可以按连接变量分组,并在连接之前在第二个表上使用slice()
。
df_2 %>%
left_join(df_1 %>%
group_by(dates, x1) %>%
slice(1), by = c('dates' = 'dates', 'x2' = 'x1'))
# A tibble: 4 x 4
dates x2 text2 text1
<date> <int> <chr> <chr>
1 2018-07-01 10 text aa text a
2 2018-06-01 21 text bb text c
3 2018-05-01 22 text cc NA
4 2018-04-01 23 text dd NA
One (rather cheating) solution would be to rename the column that throws an error like so,一种(相当作弊的)解决方案是重命名引发错误的列,如下所示,
colnames(df_1)[2] = "x2"
and then do然后做
plyr::join(df_2, df_1, type = 'left', match = 'first', by = c("dates", "x2"))
which yields产生
dates x2 text2 text1
1 2018-07-01 10 text aa text a
2 2018-06-01 21 text bb text c
3 2018-05-01 22 text cc <NA>
4 2018-04-01 23 text dd <NA>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.