按组在data.table中使用：=分配多列

Question

What is the best way to assign to multiple columns using data.table ? 使用data.table分配给多个列的最佳方法是什么？ For example: 例如：

f <- function(x) {c("hi", "hello")}
x <- data.table(id = 1:10)

I would like to do something like this (of course this syntax is incorrect): 我想做这样的事情（当然这个语法是不正确的）：

x[ , (col1, col2) := f(), by = "id"]

And to extend that, I may have many columns with names stored in a variable (say col_names ) and I would like to do: 为了扩展这一点，我可能有很多列，其名称存储在变量中（例如col_names ），我想这样做：

x[ , col_names := another_f(), by = "id", with = FALSE]

What is the correct way to do something like this? 做这样的事情的正确方法是什么？

Answer 1

This now works in v1.8.3 on R-Forge. 现在可以在R-Forge的v1.8.3中使用。 Thanks for highlighting it! 感谢您突出显示它！

x <- data.table(a = 1:3, b = 1:6) 
f <- function(x) {list("hi", "hello")} 
x[ , c("col1", "col2") := f(), by = a][]
#    a b col1  col2
# 1: 1 1   hi hello
# 2: 2 2   hi hello
# 3: 3 3   hi hello
# 4: 1 4   hi hello
# 5: 2 5   hi hello
# 6: 3 6   hi hello

x[ , c("mean", "sum") := list(mean(b), sum(b)), by = a][]
#    a b col1  col2 mean sum
# 1: 1 1   hi hello  2.5   5
# 2: 2 2   hi hello  3.5   7
# 3: 3 3   hi hello  4.5   9
# 4: 1 4   hi hello  2.5   5
# 5: 2 5   hi hello  3.5   7
# 6: 3 6   hi hello  4.5   9 

mynames = c("Name1", "Longer%")
x[ , (mynames) := list(mean(b) * 4, sum(b) * 3), by = a]
#     a b col1  col2 mean sum Name1 Longer%
# 1: 1 1   hi hello  2.5   5    10      15
# 2: 2 2   hi hello  3.5   7    14      21
# 3: 3 3   hi hello  4.5   9    18      27
# 4: 1 4   hi hello  2.5   5    10      15
# 5: 2 5   hi hello  3.5   7    14      21
# 6: 3 6   hi hello  4.5   9    18      27

x[ , get("mynames") := list(mean(b) * 4, sum(b) * 3), by = a][]  # same
#    a b col1  col2 mean sum Name1 Longer%
# 1: 1 1   hi hello  2.5   5    10      15
# 2: 2 2   hi hello  3.5   7    14      21
# 3: 3 3   hi hello  4.5   9    18      27
# 4: 1 4   hi hello  2.5   5    10      15
# 5: 2 5   hi hello  3.5   7    14      21
# 6: 3 6   hi hello  4.5   9    18      27

x[ , eval(mynames) := list(mean(b) * 4, sum(b) * 3), by = a][]   # same
#    a b col1  col2 mean sum Name1 Longer%
# 1: 1 1   hi hello  2.5   5    10      15
# 2: 2 2   hi hello  3.5   7    14      21
# 3: 3 3   hi hello  4.5   9    18      27
# 4: 1 4   hi hello  2.5   5    10      15
# 5: 2 5   hi hello  3.5   7    14      21
# 6: 3 6   hi hello  4.5   9    18      27

Older version using the with argument (we discourage this argument when possible): 使用with参数的旧版本（在可能的情况下，我们不建议使用此参数）：

x[ , mynames := list(mean(b) * 4, sum(b) * 3), by = a, with = FALSE][] # same
#    a b col1  col2 mean sum Name1 Longer%
# 1: 1 1   hi hello  2.5   5    10      15
# 2: 2 2   hi hello  3.5   7    14      21
# 3: 3 3   hi hello  4.5   9    18      27
# 4: 1 4   hi hello  2.5   5    10      15
# 5: 2 5   hi hello  3.5   7    14      21
# 6: 3 6   hi hello  4.5   9    18      27

Answer 2

The following shorthand notation might be useful. 以下速记符号可能有用。 All credit goes to Andrew Brooks, specifically this article . 所有的功劳归功于Andrew Brooks，特别是本文。

dt[,`:=`(avg=mean(mpg), med=median(mpg), min=min(mpg)), by=cyl]

按组在data.table中使用：=分配多列

问题描述

2 个解决方案

解决方案1
143 已采纳 2012-10-06 08:48:38

解决方案2
35 2018-04-01 11:37:02

按组在data.table中使用：=分配多列

问题描述

2 个解决方案

解决方案1 143 已采纳 2012-10-06 08:48:38

解决方案2 35 2018-04-01 11:37:02

解决方案1
143 已采纳 2012-10-06 08:48:38

解决方案2
35 2018-04-01 11:37:02