If I have two data.tables, dt1
, and dt2
, I want the number of matches between columns using an if then sort of logic. If dt1$V1==dt$V2
, then does dt$V1 == dt$V2
? But it is key for this if-then statement to group by the matches in dt1$V1 == dt$V2
. I would like to use data.table for its efficiency, since I actually have a large dataset.
dt1 <- data.table(c("a","b","c","d","e"), c(1:5))
dt2 <- data.table(c("a","d","e","f","g"), c(3:7))
In this dummy example, there are 3 matches between the V1s, but only two within those groups for V2s. So the answer (using nrow
perhaps, if I subset), would be 2.
I suppose you are looking for fintersect
:
fintersect(dt1,dt2)
gives:
V1 V2 1: d 4 2: e 5
To get the number of rows, add [, .N]
:
fintersect(dt1,dt2)[, .N]
which gives:
[1] 2
Well this is not pretty, but, it works:
sum(dt1[V1 %in% dt2$V1]$V2 == dt2[V1 %in% dt1[V1 %in% dt2$V1]$V1]$V2)
Just read your comment, if you want a data.table with the correct combinations you can make it even longer, like this:
dt1[V1 %in% dt2$V1][dt1[V1 %in% dt2$V1]$V2 == dt2[V1 %in% dt1[V1 %in% dt2$V1]$V1]$V2]
V1 V2
1: d 4
2: e 5
I'm definitely looking forward to see other answers :)
We can just do a join
dt1[dt2, on = names(dt1), nomatch = 0]
# V1 V2
#1: d 4
#2: e 5
or inner_join
from dplyr
library(dplyr)
inner_join(dt1, dt2)
# V1 V2
#1 d 4
#2 e 5
Or with merge
merge(dt1, dt2)
# V1 V2
#1: d 4
#2: e 5
For all of the above the number of matches can be find by nrow
nrow(merge(dt1, dt2))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.