简体   繁体   English

连续日总和

[英]Sum consecutive day values

In keeping with a previous question , imagine I have a data set: 与上一个问题一致 ,假设我有一个数据集:

Date       rain code
2009-04-01  0.0 0 
2009-04-02  0.0 0 
2009-04-03  0.0 0 
2009-04-04  0.7 1 
2009-04-05 54.2 1  
2009-04-06  0.0 0 
2009-04-07  5.0 1 
2009-04-08  9.0 0 
2009-04-09  0.0 0 
2009-04-10  0.0 0 
2009-04-11  0.0 0 
2009-04-12  5.3 1  
2009-04-13 10.1 1  
2009-04-14  6.0 1  
2009-04-15  8.7 1  
2009-04-16  0.0 0 
2009-04-17  0.0 0 
2009-04-18  0.0 0 
2009-04-19  2.0 0 
2009-04-20  3.0 0 
2009-04-21  0.0 0 
2009-04-22  0.0 0 
2009-04-23  0.0 0 
2009-04-24  0.0 0 
2009-04-25  4.3 1  
2009-04-26 42.2 1  
2009-04-27 45.6 1  
2009-04-28 12.6 1  
2009-04-29  6.2 1  
2009-04-30  1.0 1  

DT = structure(list(Date = structure(c(14335, 14336, 14337, 14338, 
14339, 14340, 14341, 14342, 14343, 14344, 14345, 14346, 14347, 
14348, 14349, 14350, 14351, 14352, 14353, 14354, 14355, 14356, 
14357, 14358, 14359, 14360, 14361, 14362, 14363, 14364), class = "Date"), 
    rain = c(0, 0, 0, 0.7, 54.2, 0, 5, 9, 0, 0, 0, 5.3, 10.1, 
    6, 8.7, 0, 0, 0, 2, 3, 0, 0, 0, 0, 4.3, 42.2, 45.6, 12.6, 
    6.2, 1), code = c(0L, 0L, 0L, 1L, 1L, 0L, 1L, 0L, 0L, 0L, 
    0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 
    1L, 1L, 1L, 1L, 1L)), .Names = c("Date", "rain", "code"), row.names = c(NA, 
-30L), class = "data.frame")

I am trying to collapse the data set to get the sum of consecutive values of rain when code is 1. I need to have sum of them until the day after the event, inclusive. 我正在尝试折叠数据集,以在代码为1时获得rain的连续值的总和。我需要在事件发生后的第二天(包括首尾两天)拥有这些总和。 For example, I want to get sum of rain values from 2009-04-13 to 2009-04-06, and 2009-04-07 to 2009-04-08 separately. 例如,我想分别获取2009-04-13至2009-04-06和2009-04-07至2009-04-08的降雨值总和。 So I am trying to find way to define when the code is equal to 1 and the following day inclusive. 因此,我试图找到一种方法来定义代码何时等于1以及第二天(包括第二天)。 The final product ought to look like: 最终产品应如下所示:

Date       rain code
2009-04-01  0.0 0 
2009-04-02  0.0 0 
2009-04-03  0.0 0  
2009-04-06  54.9 1  
2009-04-08  14.0 1 
2009-04-09  0.0 0 
2009-04-10  0.0 0 
2009-04-11  0.0 0 
2009-04-16 30.1 1 
2009-04-17  0.0 0 
2009-04-18  0.0 0 
2009-04-19  2.0 0 
2009-04-20  3.0 0 
2009-04-21  0.0 0 
2009-04-22  0.0 0 
2009-04-23  0.0 0 
2009-04-24  0.0 0  
2009-04-30  111.9 1  (if last entry of data frame)

Any help on the above problem would be greatly appreciated. 对于上述问题的任何帮助将不胜感激。

Here's one way: 这是一种方法:

library(data.table)
setDT(DT)

res = DT[, .(
  Date = Date[.N], 
  rain = sum(rain),
  code = code[1L]
), by=.(g = cumsum(shift(!code, fill=FALSE)))]

res[, g := NULL]

          Date  rain code
 1: 2009-04-01   0.0    0
 2: 2009-04-02   0.0    0
 3: 2009-04-03   0.0    0
 4: 2009-04-06  54.9    1
 5: 2009-04-08  14.0    1
 6: 2009-04-09   0.0    0
 7: 2009-04-10   0.0    0
 8: 2009-04-11   0.0    0
 9: 2009-04-16  30.1    1
10: 2009-04-17   0.0    0
11: 2009-04-18   0.0    0
12: 2009-04-19   2.0    0
13: 2009-04-20   3.0    0
14: 2009-04-21   0.0    0
15: 2009-04-22   0.0    0
16: 2009-04-23   0.0    0
17: 2009-04-24   0.0    0
18: 2009-04-30 111.9    1

How it works: 这个怎么运作:

  • shift is taking the value from the prior row shift从上一行取值
  • When a logical value like !code is added up, TRUE/FALSE are treated as 1/0 当将!code类的逻辑值相加时,将TRUE / FALSE视为1/0
  • .N is the last row in the by= group .Nby=组中的最后一行

The general syntax is DT[, j, by] where j is computed using each by subset of data. 通用语法为DT[, j, by] ,其中j是使用数据的每个by子集计算的。

If you want to use base R you can always use diff to calculate when the rain starts and stops. 如果要使用底数R,则始终可以使用diff来计算降雨的开始和结束时间。

start= which(diff(df$code)==1) +1
end = c(which(diff(df$code)==-1)+1, nrow(df))
l <- mapply(":", start, end)

Getting the data to collapse is just a matter of throwing out all other non stop day indices and replacing the last day with the following, which gets the cumulative rain amount on the day that it stops raining. 使数据崩溃只需要丢弃所有其他非停止日索引,并用下一个替换最后一天,这将得到停止下雨当天的累计雨量。

lapply(l, function(x) {
  df[x,][length(x),"rain"] <- sum(df[x,"rain"])
  df[x,][length(x),]
})

[[1]]
        Date rain code
6 2009-04-06 54.9    0
[[2]]
        Date rain code
8 2009-04-08   14    0
[[3]]
         Date rain code
16 2009-04-16 30.1    0
[[4]]
         Date  rain code
30 2009-04-30 111.9    1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM