简体   繁体   中英

Two data.tables number of matching columns

If I have two data.tables, dt1 , and dt2 , I want the number of matches between columns using an if then sort of logic. If dt1$V1==dt$V2 , then does dt$V1 == dt$V2 ? But it is key for this if-then statement to group by the matches in dt1$V1 == dt$V2 . I would like to use data.table for its efficiency, since I actually have a large dataset.

dt1 <- data.table(c("a","b","c","d","e"), c(1:5))
dt2 <- data.table(c("a","d","e","f","g"), c(3:7))

In this dummy example, there are 3 matches between the V1s, but only two within those groups for V2s. So the answer (using nrow perhaps, if I subset), would be 2.

I suppose you are looking for fintersect :

fintersect(dt1,dt2)

gives:

  V1 V2 1: d 4 2: e 5 

To get the number of rows, add [, .N] :

fintersect(dt1,dt2)[, .N]

which gives:

 [1] 2 

Well this is not pretty, but, it works:

sum(dt1[V1 %in% dt2$V1]$V2 ==   dt2[V1 %in% dt1[V1 %in% dt2$V1]$V1]$V2)

Just read your comment, if you want a data.table with the correct combinations you can make it even longer, like this:

dt1[V1 %in% dt2$V1][dt1[V1 %in% dt2$V1]$V2 ==   dt2[V1 %in% dt1[V1 %in% dt2$V1]$V1]$V2]

    V1 V2
1:  d  4
2:  e  5

I'm definitely looking forward to see other answers :)

We can just do a join

dt1[dt2, on = names(dt1), nomatch = 0]
#   V1 V2
#1:  d  4
#2:  e  5

or inner_join from dplyr

library(dplyr)
inner_join(dt1, dt2)
#  V1 V2
#1  d  4
#2  e  5

Or with merge

merge(dt1, dt2)
#   V1 V2
#1:  d  4
#2:  e  5

For all of the above the number of matches can be find by nrow

nrow(merge(dt1, dt2))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM