简体   繁体   中英

Should non-equi inner join (nomatch=0L) be bidirectional?

When doing a non-equi inner join, should the order of X[Y] and Y[X] matters? I am under the impression that it should not.

library(data.table) #data.table_1.12.2
dt1 <- data.table(ID=LETTERS[1:4], TIME=2L:5L)
cols1 <- names(dt1)
dt2 <- data.table(ID=c("A", "B"), START=c(1L, 20L), END=c(3L, 30L))
cols2 <- names(dt2)

> dt1
   ID TIME
1:  A    2
2:  B    3
3:  C    4
4:  D    5

> dt2
   ID START END
1:  A     1   3
2:  B    20  30

I am trying to filter for rows in dt1 such that 1) ID matches and 2) dt1$TIME lies between dt2$START and dt2$END. Desired output:

   ID TIME
1:  A    2

Since I wanted rows from dt1 , I started with using dt1 as i in data.table[ but I am getting either columns from dt2 or encountered errors:

#no error but using x. values
dt2[dt1, on=.(ID, START<TIME, END>TIME), nomatch=0L]

#error for the rest 
dt2[dt1, on=.(ID, START<TIME, END>TIME), nomatch=0L, mget(paste0("i.", cols1))]
dt2[dt1, on=.(ID, START<TIME, END>TIME), nomatch=0L, .SD]
dt2[dt1, on=.(ID, START<TIME, END>TIME), nomatch=0L, .(START)]

Error message:

Error in [.data.table (dt2, dt1, on = .(ID, START < TIME, END > TIME), : column(s) not found: START

So I had to use dt2 as the i as a workaround:

#need to type out all the columns:
dt1[dt2, on=.(ID, TIME>START, TIME<END), nomatch=0L, .(ID, TIME=x.TIME)]
#using setNames
dt1[dt2, on=.(ID, TIME>START, TIME<END), nomatch=0L, 
    setNames(mget(paste0("x.", cols1)), cols1)]

Or is this a simple case of my misunderstanding?

References:

I am trying to filter for rows in dt1 such that 1) ID matches and 2) dt1$TIME lies between dt2$START and dt2$END.

That sounds like a semi join: Perform a semi-join with data.table

dt1[
  dt1[dt2, on=.(ID, TIME >= START, TIME <= END), nomatch=0, which=TRUE]
]

#    ID TIME
# 1:  A    2

If it's possible that multiple rows of dt2 will match rows of dt1, then the "which" output can be wrapped in unique() as in the linked answer.

There are a couple linked feature requests for a more convenient way to do this: https://github.com/Rdatatable/data.table/issues/2158

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM