[英]assigning a subset of data.table rows and columns by join
I'm trying to do something similar but different enough from what's described here: Update subset of data.table based on join我正在尝试做一些与这里描述的类似但足够不同的事情: Update subset of data.table based on join
Specifically, I'd like to assign to matching key values ( person_id
is a key in both tables) column values from table control.具体来说,我想从表控件中分配匹配的键值(
person_id
是两个表中的键)列值。 CI
is the column index. CI
是列索引。 The statement below says 'with=F' was not used
.下面的语句说
'with=F' was not used
。 when I delete those parts, it also doesn't work as expected.当我删除这些部分时,它也无法按预期工作。 Any suggestions?
有什么建议?
To rephrase: I'd like to set the subset of flatData that corresponds to control FROM control.换个说法:我想设置与控件 FROM 控件相对应的 flatData 子集。
flatData[J(eval(control$person_id)), ci, with=F] = control[, ci, with=F]
To give a reproducible example using classic R:使用经典 R 给出一个可重现的示例:
x = data.frame(a = 1:3, b = 1:3, key = c('a', 'b', 'c'))
y = data.frame(a = c(2, 5), b = c(11, 2), key = c('a', 'b'))
colidx = match(c('a', 'b'), colnames(y))
x[x$key %in% y$key, colidx] = y[, colidx]
As an aside, someone please explain how to easily assign SETS of columns without using indices!顺便说一句,请有人解释如何在不使用索引的情况下轻松分配列集! Indices and data.table are a marriage made in hell.
Indices 和 data.table 是地狱般的结合。
You can use the :=
operator along with the join simultaneously as follows:您可以同时使用
:=
运算符和连接,如下所示:
First prepare data:首先准备数据:
require(data.table) ## >= 1.9.0
setDT(x) ## converts DF to DT by reference
setDT(y)
setkey(x, key) ## set key column
setkey(y, key)
Now the one-liner:现在单线:
x[y, c("a", "b") := list(i.a, i.b)]
:=
modifies by reference (in-place). :=
通过引用修改(就地)。 The rows to modify are provided by the indices computed from the join in i
.要修改的行由从
i
的连接计算的索引提供。
ia
and ib
are the column names data.table
internally generates for easy access to i
's columns when both x
and i
have identical column names, when performing a join of the form x[i]
. ia
和ib
是列名data.table
在执行x[i]
形式的连接时,当x
和i
具有相同的列名时,为方便访问i
的列而在内部生成。
HTH HTH
PS: In your example y
's columns a and b are of type numeric and x
's are of type integer and therefore you'll get a warning when run on your data, that the types dint match and therefore a coercion had to take place. PS:在您的示例中
y
的列 a 和 b 是数字类型, x
是整数类型,因此在运行数据时会收到警告,类型 dint 匹配,因此必须采取强制措施地方。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.