[英]Sum by months of the year with decades of data in R
I have a dataframe with some monthly data for 2 decades: 我有一个包含20年月度数据的数据框:
year month value
1960 January 925
1960 February 903
1960 March 1006
...
1969 December 892
1970 January 990
1970 February 866
...
1979 December 120
I would like to create a dataframe where I sum up the totals, for each decade, by month, as follows: 我想创建一个数据框,我按月总结每个十年的总数,如下所示:
year month value
decade_60s January 4012
decade_60s February 8678
decade_60s March 9317
...
decade_60s December 3995
decade_70s January 8005
decade_70s February 9112
...
decade_70s December 325
I have been looking at the aggregate
function, but this doesn't appear to be the right option. 我一直在看aggregate
函数,但这似乎不是正确的选择。
I looked instead at some careful subsetting using the which
function but this quickly became too messy. 我使用which
函数看了一些仔细的子集which
但这很快变得太乱了。
For this kind of problem, what would be the correct approach? 对于这种问题,什么是正确的方法? Will I need to use apply
at some point, and if so, how? 我是否需要在某个时候使用apply
,如果是,如何?
I feel the temptation to use a for
loop growing but I don't think this would be the best way to improve my skills in R.. 我觉得使用for
循环增长的诱惑,但我不认为这将是提高我的技能的最好方法..
Thanks for the advice. 谢谢你的建议。
PS: The month
value is an ordinal factor, if this matters. PS: month
值是一个序数因子,如果这很重要。
Aggregate is a way to go using base R 聚合是使用基础R的一种方式
First define the decade 首先定义十年
yourdata$decade <- cut(yourdata$year, breaks=c(1960,1970,1980), labels=c(60,70),
include.lowest=TRUE, right=FALSE)
Then aggregate the data 然后汇总数据
aggregate(value ~ decade + month, data=yourdata , sum)
Then order to get required output 然后命令获得所需的输出
plyr
's count
+ gsub
are definitely your friends here: plyr
's count
+ gsub
绝对是你的朋友:
library(plyr)
dat <- structure(list(year = c(1960L, 1960L, 1960L, 1969L, 1970L, 1970L, 1979L),
month = structure(c(3L, 2L, 4L, 1L, 3L, 2L, 1L),
.Label = c("December", "February", "January", "March"),
class = "factor"),
value = c(925L, 903L, 1006L, 892L, 990L, 866L, 120L)),
.Names = c("year", "month", "value"),
class = "data.frame", row.names = c(NA, -7L))
dat$decade <- gsub("[0-9]$", "0", dat$year)
count(dat, .(decade, month), wt_var=.(value))
## decade month freq
## 1 1960 December 892
## 2 1960 February 903
## 3 1960 January 925
## 4 1960 March 1006
## 5 1970 December 120
## 6 1970 February 866
## 7 1970 January 990
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.