[英]Remove rows of a data frame from another dataframe but keep duplicated in R
I'm working in R and I have two dataframes, one is the base dataframe, and another has the rows that i need to remove from the base one.我在 R 工作,我有两个数据帧,一个是基础 dataframe,另一个是我需要从基础数据帧中删除的行。 But I can't use setdiff()
function, because it removes duplicated rows.但我不能使用setdiff()
function,因为它会删除重复的行。 Here's an example:这是一个例子:
a <- data.frame(var1 = c(1, NA, 2, 2, 3, 4, 5),
var2 = c(1, 7, 2, 2, 3, 4, 5))
b <- data.frame(id = c(2, 4),
numero = c(2, 4))
And the result must be:结果必须是:
id numero
1 1
NA 7
2 2
3 3
5 5
It must be an efficient algorithm, too, because the base dataframe has 3 million rows with 26 columns.它也一定是一种高效的算法,因为基数 dataframe 有 300 万行和 26 列。
We may need to create a sequence column before joining我们可能需要在加入之前创建一个序列列
library(data.table)
setDT(a)[, rn := rowid(var1, var2)][!setDT(b)[,
rn:= rowid(id, numero)], on = .(var1 = id, var2 = numero, rn)][,
rn := NULL][]
-output -输出
var1 var2
<num> <num>
1: 1 1
2: NA 7
3: 2 2
4: 3 3
5: 5 5
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.