简体   繁体   English

寻找连续的有条件的日子

[英]find consecutive days with condition

I want to add a new column in my dataframe that count the consecutive days with a condition: Count the consecutive days if the "return" ist higher than 3. 我想在我的数据框中添加一个新列,该列对具有条件的连续天数进行计数:如果“返回”值大于3,则对连续天数进行计数。

Here is my dataset: 这是我的数据集:

df <- tibble( 
date = lubridate::today() +0:9,
return= c(1,2.5,2,3,5,6.5,1,9,3,2))

My dataframe should look like this: 我的数据框应如下所示:

   date       return    Consec_days
   <date>      <dbl>      <dbl>
 1 2019-02-20    1         NA
 2 2019-02-21    2.5       NA
 3 2019-02-22    2         NA
 4 2019-02-23    3         NA
 5 2019-02-24    5         1
 6 2019-02-25    6.5       2
 7 2019-02-26    1         NA
 8 2019-02-27    9         NA
 9 2019-02-28    3         1
10 2019-03-01    2         NA

If the condition is not met, then give me "NA" or "0" 如果不符合条件,请给我“ NA”或“ 0”

I already tried: 我已经尝试过:

df$Consec_Days <- with(df, ave(return, data.table::rleid(return > 3), 
                               FUN = function(x) ifelse(return > 3, seq_along(x), 0L)))

But it does not work. 但这行不通。 Can someone help me? 有人能帮我吗?

An option using base R ave and data.table::rleid 使用基本R avedata.table::rleid

library(data.table)
df$Consec_days <- with(df, (return > 3) * ave(return, rleid(return > 3), FUN = seq_along))


#     date       return Consec_days
#   <date>      <dbl>       <dbl>
# 1 2019-02-20    1             0
# 2 2019-02-21    2.5           0
# 3 2019-02-22    2             0
# 4 2019-02-23    3             0
# 5 2019-02-24    5             1
# 6 2019-02-25    6.5           2
# 7 2019-02-26    1             0
# 8 2019-02-27    9             1
# 9 2019-02-28    3             0
#10 2019-03-01    2             0

Using rleid(return > 3) we create groups and then use seq_along and create sequence of observation for each group 使用rleid(return > 3)我们创建组,然后使用seq_along并为每个组创建观察序列

with(df, ave(return, rleid(return > 3), FUN = seq_along))
# [1] 1 2 3 4 1 2 1 1 1 2

We multiply it with (return > 3) to keep observations which are greater than 3 and rest all are turned to 0. 我们将其乘以(return > 3)以保留大于3的观察值,其余的全部变为0。


Translating it into dplyr we can do 可以将其翻译成dplyr

library(dplyr)

df %>%
  group_by(group = rleid(return > 3)) %>%
  mutate(consec_days = (return > 3) * row_number()) %>%
  ungroup() %>%
  select(-group)

One dplyr possibility could be: dplyr一种可能是:

df %>%
 group_by(return_rleid = {return_rleid = rle(return > 3); rep(seq_along(return_rleid$lengths), return_rleid$lengths)}) %>%
 mutate(Consec_days = ifelse(return <= 3, NA, seq_along(return_rleid))) %>%
 ungroup() %>% 
 select(-return_rleid)

   date       return Consec_days
   <date>      <dbl>       <int>
 1 2019-02-20   1.00          NA
 2 2019-02-21   2.50          NA
 3 2019-02-22   2.00          NA
 4 2019-02-23   3.00          NA
 5 2019-02-24   5.00           1
 6 2019-02-25   6.50           2
 7 2019-02-26   1.00          NA
 8 2019-02-27   9.00           1
 9 2019-02-28   3.00          NA
10 2019-03-01   2.00          NA

First, it performs a grouping by the run-length group ID. 首先,它通过游程长度组ID进行分组。 Second, if "return" is bigger than 3, it creates a sequence around the run-length group ID, otherwise assigns. 其次,如果“ return”大于3,它将在游程长度组ID周围创建一个序列,否则进行分配。 Finally, it ungroups and removes the redundant variable. 最后,它取消分组并删除冗余变量。

Or the same but generating the sequence by gl() : 或相同,但通过gl()生成序列:

df %>%
 group_by(return_rleid = {return_rleid = rle(return > 3); rep(seq_along(return_rleid$lengths), return_rleid$lengths)}) %>%
 mutate(Consec_days = ifelse(return <= 3, NA, gl(length(return_rleid), 1))) %>%
 ungroup() %>% 
 select(-return_rleid)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM