寻找连续的有条件的日子

Question

I want to add a new column in my dataframe that count the consecutive days with a condition: Count the consecutive days if the "return" ist higher than 3. 我想在我的数据框中添加一个新列，该列对具有条件的连续天数进行计数：如果“返回”值大于3，则对连续天数进行计数。

Here is my dataset: 这是我的数据集：

df <- tibble( 
date = lubridate::today() +0:9,
return= c(1,2.5,2,3,5,6.5,1,9,3,2))

My dataframe should look like this: 我的数据框应如下所示：

   date       return    Consec_days
   <date>      <dbl>      <dbl>
 1 2019-02-20    1         NA
 2 2019-02-21    2.5       NA
 3 2019-02-22    2         NA
 4 2019-02-23    3         NA
 5 2019-02-24    5         1
 6 2019-02-25    6.5       2
 7 2019-02-26    1         NA
 8 2019-02-27    9         NA
 9 2019-02-28    3         1
10 2019-03-01    2         NA

If the condition is not met, then give me "NA" or "0" 如果不符合条件，请给我“ NA”或“ 0”

I already tried: 我已经尝试过：

df$Consec_Days <- with(df, ave(return, data.table::rleid(return > 3), 
                               FUN = function(x) ifelse(return > 3, seq_along(x), 0L)))

But it does not work. 但这行不通。 Can someone help me? 有人能帮我吗？

Answer 1

An option using base R ave and data.table::rleid 使用基本R ave和data.table::rleid

library(data.table)
df$Consec_days <- with(df, (return > 3) * ave(return, rleid(return > 3), FUN = seq_along))


#     date       return Consec_days
#   <date>      <dbl>       <dbl>
# 1 2019-02-20    1             0
# 2 2019-02-21    2.5           0
# 3 2019-02-22    2             0
# 4 2019-02-23    3             0
# 5 2019-02-24    5             1
# 6 2019-02-25    6.5           2
# 7 2019-02-26    1             0
# 8 2019-02-27    9             1
# 9 2019-02-28    3             0
#10 2019-03-01    2             0

Using rleid(return > 3) we create groups and then use seq_along and create sequence of observation for each group 使用rleid(return > 3)我们创建组，然后使用seq_along并为每个组创建观察序列

with(df, ave(return, rleid(return > 3), FUN = seq_along))
# [1] 1 2 3 4 1 2 1 1 1 2

We multiply it with (return > 3) to keep observations which are greater than 3 and rest all are turned to 0. 我们将其乘以(return > 3)以保留大于3的观察值，其余的全部变为0。

Translating it into dplyr we can do 可以将其翻译成dplyr

library(dplyr)

df %>%
  group_by(group = rleid(return > 3)) %>%
  mutate(consec_days = (return > 3) * row_number()) %>%
  ungroup() %>%
  select(-group)

Answer 2

One dplyr possibility could be: dplyr一种可能是：

df %>%
 group_by(return_rleid = {return_rleid = rle(return > 3); rep(seq_along(return_rleid$lengths), return_rleid$lengths)}) %>%
 mutate(Consec_days = ifelse(return <= 3, NA, seq_along(return_rleid))) %>%
 ungroup() %>% 
 select(-return_rleid)

   date       return Consec_days
   <date>      <dbl>       <int>
 1 2019-02-20   1.00          NA
 2 2019-02-21   2.50          NA
 3 2019-02-22   2.00          NA
 4 2019-02-23   3.00          NA
 5 2019-02-24   5.00           1
 6 2019-02-25   6.50           2
 7 2019-02-26   1.00          NA
 8 2019-02-27   9.00           1
 9 2019-02-28   3.00          NA
10 2019-03-01   2.00          NA

First, it performs a grouping by the run-length group ID. 首先，它通过游程长度组ID进行分组。 Second, if "return" is bigger than 3, it creates a sequence around the run-length group ID, otherwise assigns. 其次，如果“ return”大于3，它将在游程长度组ID周围创建一个序列，否则进行分配。 Finally, it ungroups and removes the redundant variable. 最后，它取消分组并删除冗余变量。

Or the same but generating the sequence by gl() : 或相同，但通过gl()生成序列：

df %>%
 group_by(return_rleid = {return_rleid = rle(return > 3); rep(seq_along(return_rleid$lengths), return_rleid$lengths)}) %>%
 mutate(Consec_days = ifelse(return <= 3, NA, gl(length(return_rleid), 1))) %>%
 ungroup() %>% 
 select(-return_rleid)

寻找连续的有条件的日子

问题描述

2 个解决方案

解决方案1
2 2019-02-20 09:21:01

解决方案2
1 已采纳 2019-02-20 09:17:54

寻找连续的有条件的日子

问题描述

2 个解决方案

解决方案1 2 2019-02-20 09:21:01

解决方案2 1 已采纳 2019-02-20 09:17:54

解决方案1
2 2019-02-20 09:21:01

解决方案2
1 已采纳 2019-02-20 09:17:54