简体   繁体   English

将data.table列的每组中的值与向量中的值进行匹配

[英]Match values in each group of a data.table column to values in a vector

I recently started to use the data.table package to identify values in a table's column that conform to some conditions. 我最近开始使用data.table包来标识表列中符合某些条件的值。 Although and I manage to get most of the things done, now I'm stuck with this problem: 尽管和我设法完成了大部分工作,但现在我陷入了这个问题:

I have a data table, table1, in which the first column (labels) is a group ID, and the second column, o.cell, is an integer. 我有一个数据表table1,其中第一列(标签)是组ID,第二列o.cell是整数。 The key is on "labels" 钥匙在“标签”上

I have another data table, table2, containing a single column: "cell". 我还有另一个数据表table2,其中包含一列:“单元格”。

Now, I'm trying to find, for each group in table1, the values from the column "o.cell" that are in the "cell" column of table2. 现在,我试图为table1中的每个组查找table2的“单元”列中的“ o.cell”列中的值。 table1 has some 400K rows divided into 800+ groups of unequal sizes. table1有一些400K行,分为800多个不等大小的组。 table2 has about 1.3M rows of unique cell numbers. table2具有约130万行唯一单元格编号。 Cell numbers in column "o.cell" table1 can be found in more than one group. 可以在多个组中找到“ o.cell”表1列中的单元号。

This seems like a simple task but I can't find the right way to do it. 这似乎是一个简单的任务,但是我找不到正确的方法。 Depending on the way I structure my call, it either gives me a different result than what I expect or it never completes and I have to end R task because it's frozen (my machine has 24 GB RAM). 根据我构造呼叫的方式,它可能会给我带来与预期不同的结果,或者它永远无法完成,并且我必须结束R任务,因为它被冻结了(我的计算机具有24 GB RAM)。

Here's an example of one of the "variant" of the calls I have tried: 这是我尝试的呼叫“变体”之一的示例:

overlap <- table1[, list(over.cell =
              o.cell[!is.na(o.cell) & o.cell %in% table2$cell]),
              by = labels]

I pretty sure this is the wrong way to use data tables for this task and on top of that I can't get the result I want. 我很确定这是使用数据表完成此任务的错误方法,最重要的是我无法获得所需的结果。

I will greatly appreciate any help. 我将不胜感激任何帮助。 Thanks. 谢谢。

Sounds like this is your set up: 听起来这是您的设置:

dt1 = data.table(labels = c('a','b'), o.cell = 1:10)
dt2 = data.table(cell = 4:7)

And you simply want to do a simple merge: 您只想做一个简单的合并:

setkey(dt1, o.cell)
dt1[dt2]
#   o.cell labels
#1:      4      b
#2:      5      a
#3:      6      b
#4:      7      a

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM