根据两个数据帧中多列中的条件删除重复项

Question

I have 2 data frames that I need to compare to remove duplicates. 我需要比较2个数据框以删除重复项。 DF1 has columns A, B, C, D, E, F, and DF2 has columns A, B, C, G, H, I. I want to get all rows from DF1 where either column A or B matches either column A or B from DF2 AND DF2 column G is not "Y" DF1具有A，B，C，D，E，F列，而DF2具有A，B，C，G，H，I列。我想从DF1中获取所有行，其中A或B列与A列或B列匹配DF2和DF2列G中的B不是“ Y”

So something along the lines of 所以类似的东西

DF3 <- subset (DF1, (A | B %in% DF2$A | DF2$B) & (C %in% DF2$C) & (DF2$G != "Y"))

But I cant get the logical operators to work within the subset. 但是我不能让逻辑运算符在子集中工作。 Is there any way to accomplish this? 有什么办法可以做到这一点？

Answer 1

You can do this using an inner join with sqldf 您可以使用带有sqldf的内部联接来执行此操作

Example data . 示例数据。 Please provide this yourself in the future. 以后请自己提供。

df1 <- data.frame(a = 1:10, b = 1:10, c = 1:10, g = tail(letters, 10))
set.seed(2019)
df2 <- as.data.frame(lapply(df1, function(x) sample(x, replace = TRUE)))

Inner join and output: 内部联接和输出：

library(sqldf)
sqldf("
select  a.*
from    df1 a
        join df2 b      
          on  (a.a = b.a or a.b = b.b)
              and a.c = b.c
where   b.g <> 'y'
")

#   a b c g
# 1 2 2 2 r
# 2 1 1 1 q
# 3 5 5 5 u

根据两个数据帧中多列中的条件删除重复项

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-09-16 18:21:05

根据两个数据帧中多列中的条件删除重复项

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-09-16 18:21:05

解决方案1
1 已采纳 2019-09-16 18:21:05