如何使用 dplyr 在 R 中进行条件连接？

Question

For example, df1 looks like below -例如，df1 如下所示 -

X1         X2     X3     X4         X5
Apple   Belgium   Red   Purchase   100 
Guava   Germany   Green Sale       200
Grape   Italy     Purple Purchase   500
Orange India   Orange   Sale       2000

df2 looks like below - df2 如下所示 -

 X1         X2     X3     X4         X5
Apple   Belgium   Red   Purchase   10000 
Guava   Germany   Green Sale       20000
Grape   Italy     Purple Purchase   
Orange India   Orange   Sale       2000

My output should look like -我的输出应该是这样的 -

 X1         X2     X3     X4         X5.x  X5.y
Apple   Belgium   Red   Purchase   100     10000
Guava   Germany   Green Sale       200    20000
Grape   Italy     Purple Purchase   500   NA

Here multiple operations are involved -这里涉及多个操作——

Pick the rows present in 1 and not in other, vice versa选择存在于 1 中而不是其他中的行，反之亦然
Pick the mismatches in X5 column (X5 is my target column) when the first 4 column matches当前 4 列匹配时，选择 X5 列（X5 是我的目标列）中的不匹配项
I do not want the matches.我不想要比赛。

I tried a combination of inner_join, full_join and anti_join of both to obtain the part1.我尝试了两者的inner_join、full_join和anti_join的组合来获得part1。 How do I perform the second part?我如何演奏第二部分？ Is there a conditional join available in R that picks only the mismatches and ignores when the target column is same? R 中是否有条件连接仅选择不匹配项并在目标列相同时忽略？

I don't want to use sqldf.我不想使用 sqldf。 I know this can be achieved in SQL.我知道这可以在 SQL 中实现。 I want to do this in dplyr.我想在 dplyr 中做到这一点。 Any help is much appreciated.任何帮助深表感谢。

TIA. TIA。

Answer 1

left_join(df1, df2, by = c("X1", "X2", "X3", "X4")) %>%
  filter(X5.x != X5.y | is.na(X5.x) | is.na(X5.y))
#      X1      X2     X3       X4 X5.x  X5.y
# 1 Apple Belgium    Red Purchase  100 10000
# 2 Guava Germany  Green     Sale  200 20000
# 3 Grape   Italy Purple Purchase  500    NA

Is there a conditional join available in R that picks only the mismatches and ignores when the target column is same? R 中是否有条件连接仅选择不匹配项并在目标列相同时忽略？

Yes, I think you could do this with non-equi joins in data.table .是的，我认为您可以使用data.table非对等连接来做到这data.table 。 Or sqldf , as you mention.或sqldf ，正如您所提到的。

I want to do this in dplyr.我想在 dplyr 中做到这一点。

dplyr only joins on equality. dplyr仅在相等时加入。 So you join and then filter.所以你加入然后过滤。

Using this data:使用这些数据：

df1 = read.table(text = "X1         X2     X3     X4         X5
Apple   Belgium   Red   Purchase   100 
Guava   Germany   Green Sale       200
Grape   Italy     Purple Purchase   500
Orange India   Orange   Sale       2000", header = T)

df2 = read.table(text = "X1         X2     X3     X4         X5
Apple   Belgium   Red   Purchase   10000 
Guava   Germany   Green Sale       20000
Grape   Italy     Purple Purchase   NA
Orange India   Orange   Sale       2000", header = T)

Answer 2

(df1 
%>% anti_join(., df2, by = c("X1", "X2", "X3", "X4","X5")) 
%>% left_join(., df2, by = c("X1", "X2", "X3", "X4"))
)

    X1      X2     X3       X4 X5.x  X5.y
1 Apple Belgium    Red Purchase  100 10000
2 Guava Germany  Green     Sale  200 20000
3 Grape   Italy Purple Purchase  500    NA

如何使用 dplyr 在 R 中进行条件连接？

问题描述

2 个解决方案

解决方案1
1 已采纳 2018-10-04 14:02:36

解决方案2
1 2019-11-25 15:25:39

如何使用 dplyr 在 R 中进行条件连接？

问题描述

2 个解决方案

解决方案1 1 已采纳 2018-10-04 14:02:36

解决方案2 1 2019-11-25 15:25:39

解决方案1
1 已采纳 2018-10-04 14:02:36

解决方案2
1 2019-11-25 15:25:39