I recently started to use the data.table package to identify values in a table's column that conform to some conditions. Although and I manage to get most of the things done, now I'm stuck with this problem:
I have a data table, table1, in which the first column (labels) is a group ID, and the second column, o.cell, is an integer. The key is on "labels"
I have another data table, table2, containing a single column: "cell".
Now, I'm trying to find, for each group in table1, the values from the column "o.cell" that are in the "cell" column of table2. table1 has some 400K rows divided into 800+ groups of unequal sizes. table2 has about 1.3M rows of unique cell numbers. Cell numbers in column "o.cell" table1 can be found in more than one group.
This seems like a simple task but I can't find the right way to do it. Depending on the way I structure my call, it either gives me a different result than what I expect or it never completes and I have to end R task because it's frozen (my machine has 24 GB RAM).
Here's an example of one of the "variant" of the calls I have tried:
overlap <- table1[, list(over.cell =
o.cell[!is.na(o.cell) & o.cell %in% table2$cell]),
by = labels]
I pretty sure this is the wrong way to use data tables for this task and on top of that I can't get the result I want.
I will greatly appreciate any help. Thanks.
Sounds like this is your set up:
dt1 = data.table(labels = c('a','b'), o.cell = 1:10)
dt2 = data.table(cell = 4:7)
And you simply want to do a simple merge:
setkey(dt1, o.cell)
dt1[dt2]
# o.cell labels
#1: 4 b
#2: 5 a
#3: 6 b
#4: 7 a
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.