[英]Summarize based on two grouping variables in R using data.table
I'm trying to use data.table
in R to summarize the following data table: 我正在尝试使用R中的
data.table
来总结以下数据表:
SiteNo Var1 Var2 Var3 ... Var18 Group
1 0.1 0.3 1 0.3 1
2 0.3 0.1 0.9 0.2 1
etc.
There are 668,944 observations, 43 sites, 3 groups, and 19 variables. 共有668,944个观测值,43个站点,3个组和19个变量。 I'd like to get the results of a function (eg,
mean
) which summarizes each column/variable by both site and group. 我想获得一个函数的结果(例如,
mean
),它通过站点和组来汇总每个列/变量。 So there should be 43 sites x 3 groups x # of summary stats (eg, mean
). 所以应该有43个站点x 3组x#的摘要统计数据(例如,
mean
)。 I've used the following code: 我使用了以下代码:
e.dt<-data.table(e)
setkey(e.dt, Group) # set key to group number
# get mean for each column/variable
e.dt.mean<-e.dt[,lapply(.SD,mean), by="SiteNo"]
Using the above, I get 43 sites, but not the 3 groups I was after. 使用上面的内容,我得到了43个站点,但不是我追求的3个站点。 I could split the original data table into the three groups, but was wondering if there was a way of summarizing using two variables (SiteNo and Group) using
data.table
. 我可以将原始数据表拆分为三组,但是想知道是否有使用data.table使用两个变量(SiteNo和Group)进行
data.table
。
I'm still RTM on data.table
, but so far I haven't found the answer to the above. 我仍然是
data.table
RTM,但到目前为止我还没有找到上面的答案。
Try setting your key to both "Group" and "SiteNo": 尝试将您的密钥设置为“Group”和“SiteNo”:
From the example under ?key
: 从
?key
下的示例:
keycols <- c("SiteNo", "Group")
setkeyv(e.dt, keycols)
Then, use by
as: 然后,使用
by
如下:
e.dt[, lapply(.SD,mean), by = key(e.dt)]
Alternatively, you can use: 或者,您可以使用:
e.dt[, lapply(.SD,mean), by = "SiteNo,Group"]
or 要么
e.dt[, lapply(.SD, mean), by = list(SiteNo, Group)]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.