简体   繁体   English

R中的条件滚动总和(滚动平均值)

[英]Conditional Rolling Sum (rolling average) in R

ID  Year  Firm Score

1   2005    A   2
1   2006    A   5
1   2006    B   1
1   2007    A   36
1   2007    E   69
1   2008    E   8
1   2008    B   54
1   2009    A   25
1   2009    C   2
1   2010    E   2
1   2010    B   2
1   2011    A   5
1   2011    B   5
1   2012    A   4
1   2012    B   1

Data 数据

In above data, I want to perform group by 5 year rolling sum of 'Score' for each individual (ID) conditional on the current year Firm the person is working in. Let me explain this by example. 在上面的数据中,我想根据该人员所工作的当年公司的情况,对每个人(ID)进行“分数”的5年滚动汇总。让我举例说明。 Suppose I want rolling sum of 'Score' variable for year 2009, it should first check the firms in which the person (ID) is working. 假设我想要2009年的“得分”变量的总和,它应该首先检查该人员(ID)工作所在的公司。 In 2009, person is working in A and C. It should then calculate 5 year rolling sum of 'Score' only for Firms A or C. Output of 5 year rolling sum for year 2009 will be (2 (for year 2005 firm A) + 5 ((for year 2006 firm A)) + 36 (for year 2007 firm A) + 27 (for year 2009 firm A and C) ) = 70. [Note: Year 2008 is ignored because person is neither registered in firm A nor firm C] 在2009年,某人在A和C工作。然后,应仅对A或C公司计算5年的“得分”滚动总和。2009年5年滚动总和的输出为(2(对于2005年公司A)) + 5((对于2006年为公司A))+ 36(对于2007年为公司A)+ 27(对于2009年为A和C公司))= 70。也不是C]

I also want to perform Rolling Average on similar lines. 我也想在类似的行上执行滚动平均。 [Note: Original data has around 30 million observations] [注:原始数据约有3000万个观测值]

set up dataframe 设置数据框

rs <- as.data.frame(matrix(nrow =15, ncol = 4))

colnames(rs) <- c("ID", "Year", "Firm", "Score")

rs$ID <- 1
rs$Year <- c(2005,
             2006,
             2006,
             2007,
             2007,
             2008,
             2008,
             2009,
             2009,
             2010,
             2010,
             2011,
             2011,
             2012,
             2012)

rs$Firm <- c("A", "A", "B", "A", "E",
             "E", "B", "A", "C", "E", 
             "B", "A", "B", "A", "B")

rs$Score <- c(2, 5, 1, 36, 69, 8, 
              54, 25, 2, 2, 2, 5, 5, 4,
              1)

loop over unique years 遍历独特的年份

a <- rs$Year

for(i in unique(a)){

  b <- rs[rs$Year == i,]
  c <- (b$Firm)
  d <-  rs[rs$Year <=  i & rs$Firm %in% c,]
  print(paste(i, sum(d$Score)))

}

Output: 输出:

[1] "2005 2"
[1] "2006 8"
[1] "2007 112"
[1] "2008 132"
[1] "2009 70"
[1] "2010 136"
[1] "2011 135"
[1] "2012 140"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM