简体   繁体   中英

R Joining with data.table

I'm trying to join two data frames (data.tables), x and y, on x$val >= y$start & x$val <= y$end. I can't use dplyr because the only way to do inequality joins in dplyr is to join the tables then filter on the inequality, and the tables I want to join have 315k and 84k records. That would blow out memory.

data.table does have inequality joins, but I cannot for the life of me figure out how the syntax works. See this result:

x <- data.table(val = c(1:5), id = "a")
y <- data.table(start = c(1:5), end = c(11:15), id= "a")

x[y, on=c("val>=start","val<=end"),
  .(start, val, end)]

    start val end
 1:     1   1  11
 2:     1   1  11
 3:     1   1  11
 4:     1   1  11
 5:     1   1  11
 6:     2   2  12
 7:     2   2  12
 8:     2   2  12
 9:     2   2  12
10:     3   3  13
11:     3   3  13
12:     3   3  13
13:     4   4  14
14:     4   4  14
15:     5   5  15

To show what I would expect to get, here is what dplyr produces:

x <- data.table(val = c(1:5), id = "a")
y <- data.table(start = c(1:5), end = c(11:15), id= "a")

x %>% 
  inner_join(y) %>% 
  filter(val >= start & val <= end)

   val   id start end
1    1    a     1  11
2    2    a     1  11
3    2    a     2  12
4    3    a     1  11
5    3    a     2  12
6    3    a     3  13
7    4    a     1  11
8    4    a     2  12
9    4    a     3  13
10   4    a     4  14
11   5    a     1  11
12   5    a     2  12
13   5    a     3  13
14   5    a     4  14
15   5    a     5  15

Can anyone explain what it is I'm missing with the data.table syntax?

You need to select val column from the left table with prefix ( x. ); For more information, see this answer :

x[y, on=.(val >= start, val <= end), .(val = x.val, id, start, end)][order(val)]
 #                                           ^^
 #   val id start end
 #1:   1  a     1  11
 #2:   2  a     1  11
 #3:   2  a     2  12
 #4:   3  a     1  11
 #5:   3  a     2  12
 #6:   3  a     3  13
 #7:   4  a     1  11
 #8:   4  a     2  12
 #9:   4  a     3  13
#10:   4  a     4  14
#11:   5  a     1  11
#12:   5  a     2  12
#13:   5  a     3  13
#14:   5  a     4  14
#15:   5  a     5  15

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM