简体   繁体   English

通过连接分配 data.table 行和列的子集

[英]assigning a subset of data.table rows and columns by join

I'm trying to do something similar but different enough from what's described here: Update subset of data.table based on join我正在尝试做一些与这里描述的类似但足够不同的事情: Update subset of data.table based on join

Specifically, I'd like to assign to matching key values ( person_id is a key in both tables) column values from table control.具体来说,我想从表控件中分配匹配的键值( person_id是两个表中的键)列值。 CI is the column index. CI是列索引。 The statement below says 'with=F' was not used .下面的语句说'with=F' was not used when I delete those parts, it also doesn't work as expected.当我删除这些部分时,它也无法按预期工作。 Any suggestions?有什么建议?

To rephrase: I'd like to set the subset of flatData that corresponds to control FROM control.换个说法:我想设置与控件 FROM 控件相对应的 flatData 子集。

flatData[J(eval(control$person_id)), ci, with=F] = control[, ci, with=F]

To give a reproducible example using classic R:使用经典 R 给出一个可重现的示例:

x = data.frame(a = 1:3, b = 1:3, key = c('a', 'b', 'c'))
y = data.frame(a = c(2, 5), b = c(11, 2), key = c('a', 'b'))

colidx = match(c('a', 'b'), colnames(y))

x[x$key %in% y$key, colidx] = y[, colidx]

As an aside, someone please explain how to easily assign SETS of columns without using indices!顺便说一句,请有人解释如何在不使用索引的情况下轻松分配列集! Indices and data.table are a marriage made in hell. Indices 和 data.table 是地狱般的结合。

You can use the := operator along with the join simultaneously as follows:您可以同时使用:=运算符和连接,如下所示:

First prepare data:首先准备数据:

require(data.table) ## >= 1.9.0
setDT(x)            ## converts DF to DT by reference
setDT(y)
setkey(x, key)      ## set key column
setkey(y, key)

Now the one-liner:现在单线:

x[y, c("a", "b") := list(i.a, i.b)]

:= modifies by reference (in-place). :=通过引用修改(就地)。 The rows to modify are provided by the indices computed from the join in i .要修改的行由从i的连接计算的索引提供。

ia and ib are the column names data.table internally generates for easy access to i 's columns when both x and i have identical column names, when performing a join of the form x[i] . iaib是列名data.table在执行x[i]形式的连接时,当xi具有相同的列名时,为方便访问i的列而在内部生成。

HTH HTH

PS: In your example y 's columns a and b are of type numeric and x 's are of type integer and therefore you'll get a warning when run on your data, that the types dint match and therefore a coercion had to take place. PS:在您的示例中y的列 a 和 b 是数字类型, x是整数类型,因此在运行数据时会收到警告,类型 dint 匹配,因此必须采取强制措施地方。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM