[英]R : Getting the sum of columns in a data.frame group by a certain column
I have a sample data.frame as below, I want to create another data.frame that contains the statistical information of that table by a certain column, how can I do that? 我有一个示例data.frame,如下所示,我想创建另一个data.frame,它按特定列包含该表的统计信息,我该怎么做?
Like for example in the data.frame below, I like to get the sum of each column by Chart. 像下面的data.frame中的例子一样,我想通过Chart获取每列的总和。
Sample data.frame: 示例数据框:
Chart Sum Sum_Squares Count Average
Chart1 2 4 4 1
Chart1 3 9 3 1.5
Chart2 4 16 5 2
Chart2 5 25 2 2.5
Desired output: 所需的输出:
Chart Sum_sum Sum_square_sum Count_sum Average_sum
Chart1 5 13 7 2.5
Chart2 9 41 7 4.5
I have tried below code but the return table only contains Chart and V1. 我试过下面的代码,但返回表仅包含Chart和V1。 sum_stat is the data.frame
sum_stat是data.frame
sum_stat = data.table(spc_point[,c("CHART", "SUM", "SUM_SQUARES", "COUNT", "AVERAGE")])[,c(SUM_SUM=sum(SUM), SUM_SQUARE_SUM=sum(SUM_SQUARES), COUNT_SUM=sum(COUNT), AVERAGE_SUM=sum(AVERAGE)),by=list(CHART)]
Thanks ahead 提前谢谢
I'm going to advocate using data.table. 我将提倡使用data.table。 try this:
尝试这个:
data<-data.table("Chart"=c("Chart1","Chart1","Chart2","Chart2"), "Sum"=c(2,3,4,5),"Sum_Squares"=c(4,9,16,25),"Count"=c(4,3,5,2),"Average"=c(1,1.5,2,2.5),key="Chart")
and then simply: 然后简单地:
summed.data<-data[,lapply(.SD,sum),by=Chart]
find data.table package, read vignette and faq - use it :) 找到data.table包,阅读插图和常见问题-使用它:)
You may consider dplyr
. 您可以考虑
dplyr
。 Suppose df
is your data frame, the following will produce the desired result. 假设
df
是您的数据帧,则以下将产生所需的结果。
library(dplyr)
df %.% group_by(Chart) %.%
summarise(Sum=sum(Sum),
Sum_Squares = sum(Sum_Squares),
Count= sum(Count),
Average= sum(Average))
or it can be laid out like that in data.table
too : 或者也可以在
data.table
这样data.table
:
dt = as.data.table(df)
dt[, list(Sum=sum(Sum),
Sum_Squares = sum(Sum_Squares),
Count= sum(Count),
Average= sum(Average)),
by=Chart]
In base R: 在基数R中:
aggregate(df[,2:5],by=list(df$Chart),FUN=sum)
# Group.1 Sum Sum_Squares Count Average
# 1 Chart1 5 13 7 2.5
# 2 Chart2 9 41 7 4.5
As @AnandaMahto points out, the formula syntax for aggregate(...)
is simpler and cleaner. 正如@AnandaMahto所指出的,
aggregate(...)
的公式语法更简单,更简洁。
aggregate(. ~ Chart, df, sum)
# Chart Sum Sum_Squares Count Average
# 1 Chart1 5 13 7 2.5
# 2 Chart2 9 41 7 4.5
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.