[英]How to calculate the mean in a data frame using aggregate function in R?
I have a data frame df1: 我有一个数据框df1:
number=c(4,3,2,3,4,1)
year=c("2000","2000","2000", "2015", "2015", "2015")
items=c(12, 10, 15, 5, 10, 7)
df1=data.frame(number, year, items)
setDT(df1)[, Prop := number/sum(number), by = year]
such that it looks like this: 它看起来像这样:
number year items Prop
1: 4 2000 12 0.4444444
2: 3 2000 10 0.3333333
3: 2 2000 15 0.2222222
4: 3 2015 5 0.3750000
5: 4 2015 10 0.5000000
6: 1 2015 7 0.1250000
I want to get the mean of the number of items per year, so I tried using this fuction: 我想得到每年的项目数的平均值,所以我尝试使用这个功能:
mean.df1=aggregate((df1$number*df1$Prop),list(df1$year), mean)
but it returns the wrong values for the mean. 但它返回错误的平均值。 I want it to return:
我希望它返回:
Group.1 x
1 2000 2.918918
2 2015 2.296296
where Group.1 is the year and x is the correct mean. 其中Group.1是年份,x是正确的平均值。
Thanks! 谢谢!
To aggregate
mean number of items/year aggregate
平均项目数/年
aggregate(number ~ year, data=df1, mean)
# year number
# 1 2000 3.000000
# 2 2015 2.666667
For the weighted average in base R you could do standard split-apply-combine 对于基数R的加权平均值,您可以进行标准的分割 - 应用 - 组合
sapply(split(df1, df1$year), function(x) weighted.mean(x$number, w=x$items))
or 要么
sapply(split(df1, df1$year), function(x) sum(x$number*x$items)/sum(x$items))
# 2000 2015
# 2.918919 2.818182
How about using the dplyr
package 如何使用
dplyr
包
library(dplyr)
df1 %>% group_by(year) %>% summarise(mean = sum(number * items)/sum(items))
which gives 这使
year mean
1 2000 2.918919
2 2015 2.818182
我只需要在我的聚合函数中将“mean”切换为“sum”,使其变为:
mean.df1=aggregate((df1$number*df1$Prop),list(df1$year), sum)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.