[英]R data.table merge tables grouping by multiple columns
I have two huge data tables ( dt1
and dt2
) that are almost identical except for 1 column. 我有两个巨大的数据表(
dt1
和dt2
),除了1列外几乎相同。 I want to join the tables by the p-1
columns, where p <- ncol(dt1)
. 我想通过
p-1
列加入表,其中p <- ncol(dt1)
。 Should I setkey()
to the p-1
columns and join using dt1[dt2]
? 我应该将
setkey()
设置为p-1
列并使用dt1[dt2]
加入吗? If that is the case, how can I enter the arguments in setkey()
since I can't put quoted string as argument. 在这种情况下,由于无法将带引号的字符串作为参数,因此如何在
setkey()
输入参数。
Here is some simulated data: 这是一些模拟数据:
dt1 <- data.table(matrix(rnorm(260), 10, 26))
setnames(dt1, letters)
dt2 <- copy(dt1)
dt2[,z:=rnorm(10)]
## Sections below won't run
setkey(dt1, get(letters[-which(letters=="z")]))
setkey(dt2, get(letters[-which(letters=="z")]))
dt1[dt2]
Use setkeyv
: 使用
setkeyv
:
setkeyv(dt1, letters[-which(letters=="z")])
setkeyv(dt2, letters[-which(letters=="z")])
dt1[dt2]
If you know the name of the different column this works 如果您知道其他列的名称,则可以使用
merge(dt1,dt2,names(dt1)[-grep("z",names(dt1))])
It also preserves the two original differing columns as dt$zx
and dt$zy
它还将两个原始不同的列保留为
dt$zx
和dt$zy
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.