简体   繁体   English

求和忽略重置的NA

[英]Cumsum ignoring NA's with reset

I have conditional sum with reset at zero. 我有条件总和,重置为零。

criteria1 <- c(rep(0,2), rep(1,5), rep(0,3), rep(1,6),rep(0,2))
criteria1[c(6,9,12,13,14,15)] <- NA

#cumsum function, working before the first NA
ave(criteria1, cumsum(criteria1 == 0), FUN = cumsum )
[1]  0  0  1  1  1 NA  1  0 NA  0  1 NA NA NA NA  1  0  0

#and desired output would be
#NA's are replaced with the last value accumulated
#if more than three leave NA's in 
0 0 1 2 3 3 4 0 0 0 1 NA NA NA NA 2 0 0

Some conditions: 一些条件:

  • NA s can not be replaced with zero(or one), NA不能用零(或一)代替,
  • vector must remain the same length (so excluding is not an option) 向量必须保持相同的长度(因此排除不是一个选择)
  • longest length of consecutive ignored NA s should be three. 连续被忽略的NA的最长长度应为3。 If it is more than three, then they should remain as NA s and function should continue from the last non NA . 如果大于三个,则它们应保留为NA并且功能应从最后一个非NA继续。

Some answers exist on the same topic, but I am not sure how to put it all together. 关于同一主题存在一些答案,但是我不确定如何将它们放在一起。
Thanks 谢谢

With R base you can do: generate data 使用R base可以做到:生成数据

criteria1 <- c(rep(0,2), rep(1,5), rep(0,3), rep(1,6),rep(0,2))
criteria1[c(6,9,12,13)] <- NA

get result 得到结果

l <- length(criteria1)
cum <- cumsum(ifelse(!is.na(criteria1),criteria1,0))
zero <- which(criteria1 == 0)

res <- cum - rep(cum[zero], c(zero[2:length(zero)],l+1)-zero)

optional dplyr solution: 可选的dplyr解决方案:

res <- cum - rep(cum[zero], dplyr::coalesce(dplyr::lead(zero),l+1L)-zero)

detect and change repeats of NA > 3 times 检测并更改NA> 3次的重复

NAs <- rle(is.na(criteria1))
NAloc <- which(NAs$lengths > 3 & NAs$values == 1)
for(i in NAloc)
{
res[seq(sum(NAs$lengths[1:(i-1)])+1,sum(NAs$lengths[1:i]))] <- NA
}

Since NA s are treated as zero when summed up but they are grouped as if they have the same value as previous values, you can treat NA differently based on the logic in the value variable and group variable within ave : 由于NA的总和被视为零,但它们的分组就好像它们具有与先前值相同的值,因此您可以基于ave value变量和group变量中的逻辑来区别对待NA

library(data.table); library(dplyr); library(zoo);

ave(coalesce(criteria1, 0), rleid(na.locf(criteria1 != 0)), FUN = cumsum)
# [1] 0 0 1 2 3 3 4 0 0 0 1 1 1 2 3 4 0 0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM