简体   繁体   中英

Match values in each group of a data.table column to values in a vector

I recently started to use the data.table package to identify values in a table's column that conform to some conditions. Although and I manage to get most of the things done, now I'm stuck with this problem:

I have a data table, table1, in which the first column (labels) is a group ID, and the second column, o.cell, is an integer. The key is on "labels"

I have another data table, table2, containing a single column: "cell".

Now, I'm trying to find, for each group in table1, the values from the column "o.cell" that are in the "cell" column of table2. table1 has some 400K rows divided into 800+ groups of unequal sizes. table2 has about 1.3M rows of unique cell numbers. Cell numbers in column "o.cell" table1 can be found in more than one group.

This seems like a simple task but I can't find the right way to do it. Depending on the way I structure my call, it either gives me a different result than what I expect or it never completes and I have to end R task because it's frozen (my machine has 24 GB RAM).

Here's an example of one of the "variant" of the calls I have tried:

overlap <- table1[, list(over.cell =
              o.cell[!is.na(o.cell) & o.cell %in% table2$cell]),
              by = labels]

I pretty sure this is the wrong way to use data tables for this task and on top of that I can't get the result I want.

I will greatly appreciate any help. Thanks.

Sounds like this is your set up:

dt1 = data.table(labels = c('a','b'), o.cell = 1:10)
dt2 = data.table(cell = 4:7)

And you simply want to do a simple merge:

setkey(dt1, o.cell)
dt1[dt2]
#   o.cell labels
#1:      4      b
#2:      5      a
#3:      6      b
#4:      7      a

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM