简体   繁体   English

R data.table按名称存储在变量中的列引用的分组总和

[英]R data.table grouped sum for column referenced by name stored in a variable

The problem is as follows: I have a data.table with columns A and B. A summary is required and its name is passed as a character vector in variable var1. 问题如下:我有一个带有A和B列的data.table。需要一个摘要,它的名称作为字符向量传递到变量var1中。

I have tried to find an answer for some time now, see eg this and this SO posts. 我已经尝试寻找答案了一段时间,请参阅例如 SO帖子。 Being unable to find a proper solution, I feel forced to ask this myself. 无法找到合适的解决方案,我不得不问这个问题。

Now what I want to do is (using data.frame) 现在我要做的是(使用data.frame)

tmp[, var1] <- rep(1, nrow(tmp))
tmp <- aggregate(formula(paste(var1, "~ A + B")), tmp, sum)

but I fail to do so with data.table with my last and best effort being 但是我最后的努力是无法对data.table做到这一点

tmp <- tmp[, list(..var1 = .N), by = list(A, B)]

Now, what is wrong with my code and how do I fix it? 现在,我的代码出了什么问题以及如何解决?

And note that I do NOT want to use the := operator, because I want the result to be exactly as it would be from aggregate(). 并请注意,我不想使用:=运算符,因为我希望结果与聚合()的结果完全相同。

Edit 1: A working example: 编辑1:一个工作示例:

library(data.table)
tmp <- data.table(A=c("R","G","G","B","B","B"), B=c(1,1,1,2,1,2))
print(tmp)

var1 <- "C"

tmp[, var1] <- rep(1, nrow(tmp))
tmp2 <- aggregate(formula(paste(var1, "~ A + B")), tmp, sum)
print(tmp2)

tmp3 <- tmp[, list(..var1 = .N), by = list(A, B)]
print(tmp3)

Hope that I did not misread your qn. 希望我没有误读您的qn。 Here are some options: 以下是一些选项:

1) using base::setNames 1)使用base::setNames

DT[, setNames(.(.N), var1), by=.(A, B)]

2) using data.table::setnames 2)使用data.table::setnames

setnames(DT[, .N, by=.(A, B)], "N", var1)[]

3) using base::structure followed by base::as.list 3)使用base::structure后跟base::as.list

DT[, as.list(structure(.N, names=var1)), by=.(A, B)]

data: 数据:

DT <- data.table(A=c(1,1,2,2), B=c(1,1,2,3))
var1 <- "myCol"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM