[英]How to select rows in a data.table matching simultaneously two conditions in another data.table?
I have two data.tables DT1
and DT2
, with DT1
possibly large and more columns than DT2
. 我有两个data.tables
DT1
和DT2
,其中DT1
可能比DT2
大且列更多。 I want to select rows in DT1
where two columns of DT1
have exact matches in the same row of two columns in DT2
. 我想选择
DT1
中的行,其中DT1
的两列在DT2
的两列的同一行中具有完全匹配。 For example 例如
DT1 = data.table(x=rep(c("b","a","c"),each=3), y=c(1,3,6), z=1:9)
DT2 = data.table(f=c("a","b"), g=c(1,3))
The output, DT1sub
, I'm looking for is 我正在寻找的输出
DT1sub
是
x y z
1: a 1 4
2: b 3 2
My problem is, when I try to subset DT1
, I also get those rows for which only one column matches 我的问题是,当我尝试对
DT1
进行子集化时,我还会得到那些只有一列匹配的行
> DT1[x%in%DT2$f & y%in%DT2$g]
# x y z
# 1: b 1 1
# 2: b 3 2
# 3: a 1 4
# 4: a 3 5
I could get my desired output, DT1sub
, with a clunky for
loop like 我可以得到我想要的输出
DT1sub
,带有笨拙的for
循环,例如
DT1sub<-c()
for (i in 1:2)
DT1sub<-rbind(DT1sub,DT1[x==DT2$f[i] & y==DT2$g[i]])
DT1sub
but I was wondering if there was a smarter data.table version of this. 但我想知道是否有更智能的data.table版本。 This is probably straightforward, but I couldn't piece it together from the
example("data.table")
. 这可能很简单,但是我无法通过
example("data.table")
。
Are you looking for: 您是否在寻找:
library(data.table)
DT1sub <- DT1[DT2, on = .(x = f, y = g)]
Output: 输出:
x y z
1: a 1 4
2: b 3 2
This is basically a filtering join - it only keeps those rows in x
that match anything in f
, and the same for y
and g
. 从本质上讲,这是一个过滤联接-它只保留
x
中与f
中的任何内容匹配的行,而y
和g
相同。
Another idea is to use setkey
. 另一个想法是使用
setkey
。
library(data.table)
DT1 = data.table(x=rep(c("b","a","c"),each=3), y=c(1,3,6), z=1:9)
DT2 = data.table(f=c("a","b"), g=c(1,3))
setkey(DT1, x, y)
setkey(DT2, f, g)
DT1[DT2]
# x y z
# 1: a 1 4
# 2: b 3 2
The above answers work great but I still prefer to use merge()
for this task because its arguments are more expressive: 上面的答案很好用,但是我仍然更喜欢为这个任务使用
merge()
,因为它的参数更具表现力:
DT1sub <- merge(
x = DT1,
y = DT2,
by.x = c('x', 'y'), by.y = c('f', 'g'), all.x = FALSE, all.y = FALSE)
Of course some of the arguments are redundant because they are set by default, but writing it out this way ensures you remember whether you've imposed an inner/outer join, etc. 当然,某些参数是多余的,因为它们是默认设置的,但是以这种方式写出来可以确保您记住是否已施加了内部/外部联接等。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.