简体   繁体   English

R data.table:许多列的意思

[英]R data.table: mean for many columns

I would like to use the data.table package in R to calculate column means for many columns by another set of columns. 我想使用Rdata.table包来计算另一组列的许多列的列含义。 I know how to do this for a few columns, and I provide an example below. 我知道如何为几列做这个,我在下面提供了一个例子。 However, in my non-toy example, I have tens of variables I would like to do this for, and I would like to find a way to do this from a vector of the column names. 但是,在我的非玩具示例中,我有几十个变量我想这样做,我想找到一种方法从列名称的向量中执行此操作。 Is this possible? 这可能吗?

library(data.table)

# creates data table
dfo <- data.frame(bananas = 1:5, 
             melonas = 6:10,
             yeah = 11:15,
             its = c(1,1,1,2,2)
             )
dto <- data.table(dfo)

# gets column means by 'its' column
dto[,
.('bananas_mean' = mean(bananas),
  'melonas_mean' = mean(melonas),
  'yeah_mean' = mean(yeah)
  ),
by = .(its)]

The OP has requested to calculate column means for many columns ... from a vector of the column names . OP已经请求从列名称的向量计算许多列的列均值 In addition, the OP has demonstrated in his sample code that he wants to rename the resulting columns. 此外,OP在他的示例代码中已经证明他想要重命名结果列。

Both the excepted answer and the solution suggested in this comment do not fully meet all these requirements. 本评论中建议的例外答案和解决方案都不能完全满足所有这些要求。 The accepted answer computes means for all columns of the data.table and doesn't rename the results. 接受的答案计算data.table的所有列的均值,并且不重命名结果。 The solution in the comments does use a vector of column names and renames the results but modifies the original data.table while the OP expects a new object . 注释中的解决方案确实使用列名称向量并重命名结果,但修改原始data.table,而OP期望新对象

The requirements of the OP can be met using the code below: 使用以下代码可以满足OP的要求:

# define columns to compute mean of
cols <- c("bananas", "melonas")
# compute means for selected columns and rename the output
result <- dto[, lapply(.SD, mean), .SDcols = cols, by = its
              ][, setnames(.SD, cols, paste(cols, "mean", sep = "_"))]

result
#   its bananas_mean melonas_mean
#1:   1          2.0          7.0
#2:   2          4.5          9.5

Means are only computed for columns given as character vector of column names, the output columns have been renamed, and dto is unchanged. 仅对作为列名的字符向量给出的列计算平均值,输出列已重命名,并且dto未更改。

Edit Thanks to this comment and this answer , there is a way to make data.table rename the output columns automagically : 编辑感谢这个评论这个答案 ,有一种方法可以使data.table 自动重命名输出列:

result <- dto[, sapply(.SD, function(x) list(mean = mean(x))), .SDcols = cols, by = its]
result
#   its bananas.mean melonas.mean
#1:   1          2.0          7.0
#2:   2          4.5          9.5

Using data.table: 使用data.table:

library(data.table)
d <- dto[, lapply(.SD, mean), by=its]

d

   its bananas melonas yeah
1:   1     2.0     7.0 12.0
2:   2     4.5     9.5 14.5

Obviously, other functions could be used and combined. 显然,可以使用和组合其他功能。 Hope it helps. 希望能帮助到你。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM