如何选择同时匹配另一个data.table中的两个条件的data.table中的行？

Question

I have two data.tables DT1 and DT2 , with DT1 possibly large and more columns than DT2 . 我有两个data.tables DT1和DT2 ，其中DT1可能比DT2大且列更多。 I want to select rows in DT1 where two columns of DT1 have exact matches in the same row of two columns in DT2 . 我想选择DT1中的行，其中DT1的两列在DT2的两列的同一行中具有完全匹配。 For example 例如

DT1 = data.table(x=rep(c("b","a","c"),each=3), y=c(1,3,6), z=1:9)
DT2 = data.table(f=c("a","b"), g=c(1,3))

The output, DT1sub , I'm looking for is 我正在寻找的输出DT1sub是

   x y z
1: a 1 4
2: b 3 2

My problem is, when I try to subset DT1 , I also get those rows for which only one column matches 我的问题是，当我尝试对DT1进行子集化时，我还会得到那些只有一列匹配的行

> DT1[x%in%DT2$f & y%in%DT2$g]
#    x y z
# 1: b 1 1
# 2: b 3 2
# 3: a 1 4
# 4: a 3 5

I could get my desired output, DT1sub , with a clunky for loop like 我可以得到我想要的输出DT1sub ，带有笨拙的for循环，例如

DT1sub<-c()
for (i in 1:2)
  DT1sub<-rbind(DT1sub,DT1[x==DT2$f[i] & y==DT2$g[i]])
DT1sub

but I was wondering if there was a smarter data.table version of this. 但我想知道是否有更智能的data.table版本。 This is probably straightforward, but I couldn't piece it together from the example("data.table") . 这可能很简单，但是我无法通过example("data.table") 。

Answer 1

Are you looking for: 您是否在寻找：

library(data.table)

DT1sub <- DT1[DT2, on = .(x = f, y = g)]

Output: 输出：

   x y z
1: a 1 4
2: b 3 2

This is basically a filtering join - it only keeps those rows in x that match anything in f , and the same for y and g . 从本质上讲，这是一个过滤联接-它只保留x中与f中的任何内容匹配的行，而y和g相同。

Answer 2

Another idea is to use setkey . 另一个想法是使用setkey 。

library(data.table)

DT1 = data.table(x=rep(c("b","a","c"),each=3), y=c(1,3,6), z=1:9)
DT2 = data.table(f=c("a","b"), g=c(1,3))

setkey(DT1, x, y)
setkey(DT2, f, g)

DT1[DT2]
#    x y z
# 1: a 1 4
# 2: b 3 2

Answer 3

The above answers work great but I still prefer to use merge() for this task because its arguments are more expressive: 上面的答案很好用，但是我仍然更喜欢为这个任务使用merge() ，因为它的参数更具表现力：

DT1sub <- merge(
  x = DT1, 
  y = DT2, 
  by.x = c('x', 'y'), by.y = c('f', 'g'), all.x = FALSE, all.y = FALSE)

Of course some of the arguments are redundant because they are set by default, but writing it out this way ensures you remember whether you've imposed an inner/outer join, etc. 当然，某些参数是多余的，因为它们是默认设置的，但是以这种方式写出来可以确保您记住是否已施加了内部/外部联接等。

如何选择同时匹配另一个data.table中的两个条件的data.table中的行？

问题描述

3 个解决方案

解决方案1
3 已采纳 2019-02-23 15:36:31

解决方案2
1 2019-02-23 15:41:33

解决方案3
1 2019-02-23 18:51:46

如何选择同时匹配另一个data.table中的两个条件的data.table中的行？

问题描述

3 个解决方案

解决方案1 3 已采纳 2019-02-23 15:36:31

解决方案2 1 2019-02-23 15:41:33

解决方案3 1 2019-02-23 18:51:46

解决方案1
3 已采纳 2019-02-23 15:36:31

解决方案2
1 2019-02-23 15:41:33

解决方案3
1 2019-02-23 18:51:46