[英]find consecutive days with condition
I want to add a new column in my dataframe that count the consecutive days with a condition: Count the consecutive days if the "return" ist higher than 3. 我想在我的数据框中添加一个新列,该列对具有条件的连续天数进行计数:如果“返回”值大于3,则对连续天数进行计数。
Here is my dataset: 这是我的数据集:
df <- tibble(
date = lubridate::today() +0:9,
return= c(1,2.5,2,3,5,6.5,1,9,3,2))
My dataframe should look like this: 我的数据框应如下所示:
date return Consec_days
<date> <dbl> <dbl>
1 2019-02-20 1 NA
2 2019-02-21 2.5 NA
3 2019-02-22 2 NA
4 2019-02-23 3 NA
5 2019-02-24 5 1
6 2019-02-25 6.5 2
7 2019-02-26 1 NA
8 2019-02-27 9 NA
9 2019-02-28 3 1
10 2019-03-01 2 NA
If the condition is not met, then give me "NA" or "0" 如果不符合条件,请给我“ NA”或“ 0”
I already tried: 我已经尝试过:
df$Consec_Days <- with(df, ave(return, data.table::rleid(return > 3),
FUN = function(x) ifelse(return > 3, seq_along(x), 0L)))
But it does not work. 但这行不通。 Can someone help me? 有人能帮我吗?
An option using base R ave
and data.table::rleid
使用基本R ave
和data.table::rleid
library(data.table)
df$Consec_days <- with(df, (return > 3) * ave(return, rleid(return > 3), FUN = seq_along))
# date return Consec_days
# <date> <dbl> <dbl>
# 1 2019-02-20 1 0
# 2 2019-02-21 2.5 0
# 3 2019-02-22 2 0
# 4 2019-02-23 3 0
# 5 2019-02-24 5 1
# 6 2019-02-25 6.5 2
# 7 2019-02-26 1 0
# 8 2019-02-27 9 1
# 9 2019-02-28 3 0
#10 2019-03-01 2 0
Using rleid(return > 3)
we create groups and then use seq_along
and create sequence of observation for each group 使用rleid(return > 3)
我们创建组,然后使用seq_along
并为每个组创建观察序列
with(df, ave(return, rleid(return > 3), FUN = seq_along))
# [1] 1 2 3 4 1 2 1 1 1 2
We multiply it with (return > 3)
to keep observations which are greater than 3 and rest all are turned to 0. 我们将其乘以(return > 3)
以保留大于3的观察值,其余的全部变为0。
Translating it into dplyr
we can do 可以将其翻译成dplyr
library(dplyr)
df %>%
group_by(group = rleid(return > 3)) %>%
mutate(consec_days = (return > 3) * row_number()) %>%
ungroup() %>%
select(-group)
One dplyr
possibility could be: dplyr
一种可能是:
df %>%
group_by(return_rleid = {return_rleid = rle(return > 3); rep(seq_along(return_rleid$lengths), return_rleid$lengths)}) %>%
mutate(Consec_days = ifelse(return <= 3, NA, seq_along(return_rleid))) %>%
ungroup() %>%
select(-return_rleid)
date return Consec_days
<date> <dbl> <int>
1 2019-02-20 1.00 NA
2 2019-02-21 2.50 NA
3 2019-02-22 2.00 NA
4 2019-02-23 3.00 NA
5 2019-02-24 5.00 1
6 2019-02-25 6.50 2
7 2019-02-26 1.00 NA
8 2019-02-27 9.00 1
9 2019-02-28 3.00 NA
10 2019-03-01 2.00 NA
First, it performs a grouping by the run-length group ID. 首先,它通过游程长度组ID进行分组。 Second, if "return" is bigger than 3, it creates a sequence around the run-length group ID, otherwise assigns. 其次,如果“ return”大于3,它将在游程长度组ID周围创建一个序列,否则进行分配。 Finally, it ungroups and removes the redundant variable. 最后,它取消分组并删除冗余变量。
Or the same but generating the sequence by gl()
: 或相同,但通过gl()
生成序列:
df %>%
group_by(return_rleid = {return_rleid = rle(return > 3); rep(seq_along(return_rleid$lengths), return_rleid$lengths)}) %>%
mutate(Consec_days = ifelse(return <= 3, NA, gl(length(return_rleid), 1))) %>%
ungroup() %>%
select(-return_rleid)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.