Two data.tables number of matching columns

Question

If I have two data.tables, dt1 , and dt2 , I want the number of matches between columns using an if then sort of logic. If dt1$V1==dt$V2 , then does dt$V1 == dt$V2 ? But it is key for this if-then statement to group by the matches in dt1$V1 == dt$V2 . I would like to use data.table for its efficiency, since I actually have a large dataset.

dt1 <- data.table(c("a","b","c","d","e"), c(1:5))
dt2 <- data.table(c("a","d","e","f","g"), c(3:7))

In this dummy example, there are 3 matches between the V1s, but only two within those groups for V2s. So the answer (using nrow perhaps, if I subset), would be 2.

Answer 1

I suppose you are looking for fintersect :

fintersect(dt1,dt2)

gives:

  V1 V2 1: d 4 2: e 5

To get the number of rows, add [, .N] :

fintersect(dt1,dt2)[, .N]

which gives:

 [1] 2

Answer 2

Well this is not pretty, but, it works:

sum(dt1[V1 %in% dt2$V1]$V2 ==   dt2[V1 %in% dt1[V1 %in% dt2$V1]$V1]$V2)

Just read your comment, if you want a data.table with the correct combinations you can make it even longer, like this:

dt1[V1 %in% dt2$V1][dt1[V1 %in% dt2$V1]$V2 ==   dt2[V1 %in% dt1[V1 %in% dt2$V1]$V1]$V2]

    V1 V2
1:  d  4
2:  e  5

I'm definitely looking forward to see other answers :)

Answer 3

We can just do a join

dt1[dt2, on = names(dt1), nomatch = 0]
#   V1 V2
#1:  d  4
#2:  e  5

or inner_join from dplyr

library(dplyr)
inner_join(dt1, dt2)
#  V1 V2
#1  d  4
#2  e  5

Or with merge

merge(dt1, dt2)
#   V1 V2
#1:  d  4
#2:  e  5

For all of the above the number of matches can be find by nrow

nrow(merge(dt1, dt2))

Two data.tables number of matching columns

Question

3 answers

solution1
6 ACCPTED 2017-06-09 20:25:38

solution2
1 2017-06-09 20:24:25

solution3
1 2017-06-09 20:42:44

Two data.tables number of matching columns

Question

3 answers

solution1 6 ACCPTED 2017-06-09 20:25:38

solution2 1 2017-06-09 20:24:25

solution3 1 2017-06-09 20:42:44

solution1
6 ACCPTED 2017-06-09 20:25:38

solution2
1 2017-06-09 20:24:25

solution3
1 2017-06-09 20:42:44