简体   繁体   English

按年计算数十年的数据

[英]Sum by months of the year with decades of data in R

I have a dataframe with some monthly data for 2 decades: 我有一个包含20年月度数据的数据框:

year    month   value
1960    January  925
1960    February 903
1960    March    1006
    ...
1969    December 892
1970    January  990
1970    February 866
    ...
1979    December 120

I would like to create a dataframe where I sum up the totals, for each decade, by month, as follows: 我想创建一个数据框,我按月总结每个十年的总数,如下所示:

year        month    value
decade_60s  January  4012
decade_60s  February 8678
decade_60s  March    9317
    ...
decade_60s  December 3995
decade_70s  January  8005
decade_70s  February 9112
    ...
decade_70s  December 325

I have been looking at the aggregate function, but this doesn't appear to be the right option. 我一直在看aggregate函数,但这似乎不是正确的选择。
I looked instead at some careful subsetting using the which function but this quickly became too messy. 我使用which函数看了一些仔细的子集which但这很快变得太乱了。

For this kind of problem, what would be the correct approach? 对于这种问题,什么是正确的方法? Will I need to use apply at some point, and if so, how? 我是否需要在某个时候使用apply ,如果是,如何?

I feel the temptation to use a for loop growing but I don't think this would be the best way to improve my skills in R.. 我觉得使用for循环增长的诱惑,但我不认为这将是提高我的技能的最好方法..

Thanks for the advice. 谢谢你的建议。

PS: The month value is an ordinal factor, if this matters. PS: month值是一个序数因子,如果这很重要。

Aggregate is a way to go using base R 聚合是使用基础R的一种方式

First define the decade 首先定义十年

yourdata$decade <- cut(yourdata$year, breaks=c(1960,1970,1980), labels=c(60,70), 
                                           include.lowest=TRUE, right=FALSE)

Then aggregate the data 然后汇总数据

aggregate(value ~ decade + month, data=yourdata , sum) 

Then order to get required output 然后命令获得所需的输出

plyr 's count + gsub are definitely your friends here: plyr 's count + gsub绝对是你的朋友:

library(plyr)

dat <- structure(list(year = c(1960L, 1960L, 1960L, 1969L, 1970L, 1970L, 1979L),
                      month = structure(c(3L, 2L, 4L, 1L, 3L, 2L, 1L), 
                      .Label = c("December", "February", "January", "March"), 
                      class = "factor"), 
                      value = c(925L, 903L, 1006L, 892L, 990L, 866L, 120L)), 
                      .Names = c("year", "month", "value"), 
                      class = "data.frame", row.names = c(NA, -7L))

dat$decade <- gsub("[0-9]$", "0", dat$year)

count(dat, .(decade, month), wt_var=.(value))
##  decade    month freq
## 1   1960 December  892
## 2   1960 February  903
## 3   1960  January  925
## 4   1960    March 1006
## 5   1970 December  120
## 6   1970 February  866
## 7   1970  January  990

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM