使用data.table基于R中的两个分组变量进行汇总

Question

I'm trying to use data.table in R to summarize the following data table: 我正在尝试使用R中的data.table来总结以下数据表：

SiteNo Var1 Var2 Var3 ... Var18 Group
1      0.1 0.3  1         0.3     1
2      0.3 0.1  0.9       0.2     1
etc.

There are 668,944 observations, 43 sites, 3 groups, and 19 variables. 共有668,944个观测值，43个站点，3个组和19个变量。 I'd like to get the results of a function (eg, mean ) which summarizes each column/variable by both site and group. 我想获得一个函数的结果（例如， mean ），它通过站点和组来汇总每个列/变量。 So there should be 43 sites x 3 groups x # of summary stats (eg, mean ). 所以应该有43个站点x 3组x＃的摘要统计数据（例如， mean ）。 I've used the following code: 我使用了以下代码：

e.dt<-data.table(e)
setkey(e.dt, Group) # set key to group number

# get mean for each column/variable
e.dt.mean<-e.dt[,lapply(.SD,mean), by="SiteNo"]

Using the above, I get 43 sites, but not the 3 groups I was after. 使用上面的内容，我得到了43个站点，但不是我追求的3个站点。 I could split the original data table into the three groups, but was wondering if there was a way of summarizing using two variables (SiteNo and Group) using data.table . 我可以将原始数据表拆分为三组，但是想知道是否有使用data.table使用两个变量（SiteNo和Group）进行data.table 。

I'm still RTM on data.table , but so far I haven't found the answer to the above. 我仍然是data.table RTM，但到目前为止我还没有找到上面的答案。

Answer 1

Try setting your key to both "Group" and "SiteNo": 尝试将您的密钥设置为“Group”和“SiteNo”：

From the example under ?key : 从?key下的示例：

keycols <- c("SiteNo", "Group")
setkeyv(e.dt, keycols)

Then, use by as: 然后，使用by如下：

e.dt[, lapply(.SD,mean), by = key(e.dt)]

Alternatively, you can use: 或者，您可以使用：

e.dt[, lapply(.SD,mean), by = "SiteNo,Group"]

or 要么

e.dt[, lapply(.SD, mean), by = list(SiteNo, Group)]

使用data.table基于R中的两个分组变量进行汇总

问题描述

1 个解决方案

解决方案1
11 已采纳 2012-12-17 18:31:18

使用data.table基于R中的两个分组变量进行汇总

问题描述

1 个解决方案

解决方案1 11 已采纳 2012-12-17 18:31:18

解决方案1
11 已采纳 2012-12-17 18:31:18