[英]dplyr / R cumulative sum with reset
I'd like to generate cumulative sums with a reset if the "current" sum exceeds some threshold, using dplyr.如果“当前”总和超过某个阈值,我想使用 dplyr 生成带有重置的累积总和。 In the below, I want to cumsum over 'a'.
在下面,我想对 'a' 进行 cumsum。
library(dplyr)
library(tibble)
tib <- tibble(
t = c(1,2,3,4,5,6),
a = c(2,3,1,2,2,3)
)
# what I want
## thresh = 5
# A tibble: 6 x 4
# t a g c
# <dbl> <dbl> <int> <dbl>
# 1 1.00 2.00 0 2.00
# 2 2.00 3.00 0 5.00
# 3 3.00 1.00 1 1.00
# 4 4.00 2.00 1 3.00
# 5 5.00 2.00 1 5.00
# 6 6.00 3.00 2 3.00
# what I want
## thresh = 4
# A tibble: 6 x 4
# t a g c
# <dbl> <dbl> <int> <dbl>
# 1 1.00 2.00 0 2.00
# 2 2.00 3.00 0 5.00
# 3 3.00 1.00 1 1.00
# 4 4.00 2.00 1 3.00
# 5 5.00 2.00 1 5.00
# 6 6.00 3.00 2 3.00
# what I want
## thresh = 6
# A tibble: 6 x 4
# t a g c
# <dbl> <dbl> <int> <dbl>
# 1 1.00 2.00 0 2.00
# 2 2.00 3.00 0 5.00
# 3 3.00 1.00 0 6.00
# 4 4.00 2.00 1 2.00
# 5 5.00 2.00 1 4.00
# 6 6.00 3.00 1 7.00
I've examined many of the similar questions here (such as resetting cumsum if value goes to negative in r ) and have gotten what I hoped was close, but no.我在这里检查了许多类似的问题(例如, 如果 r 中的值变为负数,则重置 cumsum )并得到了我希望的结果,但没有。
I've tried variants of我试过的变种
thresh <-5
tib %>%
group_by(g = cumsum(lag(cumsum(a) >= thresh, default = FALSE))) %>%
mutate(c = cumsum(a)) %>%
ungroup()
which returns返回
# A tibble: 6 x 4
t a g c
<dbl> <dbl> <int> <dbl>
1 1.00 2.00 0 2.00
2 2.00 3.00 0 5.00
3 3.00 1.00 1 1.00
4 4.00 2.00 2 2.00
5 5.00 2.00 3 2.00
6 6.00 3.00 4 3.00
You can see that the "group" is not getting reset after the first time.您可以看到“组”在第一次之后没有被重置。
I think you can use accumulate()
here to help. 我想你可以在这里使用
accumulate()
来帮助。 And i've also made a wrapper function to use for different thresholds 而且我还制作了一个包装函数用于不同的阈值
sum_reset_at <- function(thresh) {
function(x) {
accumulate(x, ~if_else(.x>=thresh, .y, .x+.y))
}
}
tib %>% mutate(c = sum_reset_at(5)(a))
# t a c
# <dbl> <dbl> <dbl>
# 1 1 2 2
# 2 2 3 5
# 3 3 1 1
# 4 4 2 3
# 5 5 2 5
# 6 6 3 3
tib %>% mutate(c = sum_reset_at(4)(a))
# t a c
# <dbl> <dbl> <dbl>
# 1 1 2 2
# 2 2 3 5
# 3 3 1 1
# 4 4 2 3
# 5 5 2 5
# 6 6 3 3
tib %>% mutate(c = sum_reset_at(6)(a))
# t a c
# <dbl> <dbl> <dbl>
# 1 1 2 2
# 2 2 3 5
# 3 3 1 6
# 4 4 2 2
# 5 5 2 4
# 6 6 3 7
if you're interested in the group building based on cumsum < threshold
如果你对基于
cumsum < threshold
的团队建设感兴趣
You can use the following base::
function: 您可以使用以下
base::
function:
cumSumReset <- function(x, thresh = 4) {
ans <- numeric()
i <- 0
while(length(x) > 0) {
cs_over <- cumsum(x)
ntimes <- sum( cs_over <= thresh )
x <- x[-(1:ntimes)]
ans <- c(ans, rep(i, ntimes))
i <- i + 1
}
return(ans)
}
call: 呼叫:
tib %>% mutate(g = cumSumReset(a, 5))
result: 结果:
# A tibble: 6 x 3
# t a g
# <dbl> <dbl> <dbl>
#1 1 2 0
#2 2 3 0
#3 3 1 1
#4 4 2 1
#5 5 2 1
#6 6 3 2
g
you can now do whatever you like. g
你现在可以做任何你喜欢的事情。 I know it is a bit old question, but I came across this while searching for a similar question and thus thought to include this alternate approach here too.我知道这是一个有点老的问题,但我在搜索类似问题时遇到了这个问题,因此我想在这里也包含这种替代方法。
library MESS
has a inbuilt function cumsumbinning()
for these kind of requirements.库
MESS
有一个内置函数cumsumbinning()
这些需求。 Since here you need to cross that threshold
before stopping, you can use it like this (using threshold - 1
and setting cutwhenpassed = TRUE
in the third argument.由于在这里您需要在停止之前跨越该
threshold
,您可以像这样使用它(使用threshold - 1
并在第三个参数中设置cutwhenpassed = TRUE
。
library(tidyverse)
library(MESS)
tib <- tibble(
t = c(1,2,3,4,5,6),
a = c(2,3,1,2,2,3)
)
n <- 5 # threshold
tib %>%
group_by(g = cumsumbinning(a, n-1, TRUE) -1) %>%
mutate(c = cumsum(a))
#> # A tibble: 6 x 4
#> # Groups: g [3]
#> t a g c
#> <dbl> <dbl> <dbl> <dbl>
#> 1 1 2 0 2
#> 2 2 3 0 5
#> 3 3 1 1 1
#> 4 4 2 1 3
#> 5 5 2 1 5
#> 6 6 3 2 3
n <- 4 # threshold
tib %>%
group_by(g = cumsumbinning(a, n-1, TRUE) -1) %>%
mutate(c = cumsum(a))
#> # A tibble: 6 x 4
#> # Groups: g [3]
#> t a g c
#> <dbl> <dbl> <dbl> <dbl>
#> 1 1 2 0 2
#> 2 2 3 0 5
#> 3 3 1 1 1
#> 4 4 2 1 3
#> 5 5 2 1 5
#> 6 6 3 2 3
n <- 6 # threshold
tib %>%
group_by(g = cumsumbinning(a, n-1, TRUE) -1) %>%
mutate(c = cumsum(a))
#> # A tibble: 6 x 4
#> # Groups: g [2]
#> t a g c
#> <dbl> <dbl> <dbl> <dbl>
#> 1 1 2 0 2
#> 2 2 3 0 5
#> 3 3 1 0 6
#> 4 4 2 1 2
#> 5 5 2 1 4
#> 6 6 3 1 7
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.