简体   繁体   English

R dplyr 累积差异时间与条件

[英]R dplyr cumulative difftime with condition

lets say I have dataframe like this:假设我有这样的数据框:

dt <-
  data.frame(
    date = as.Date(
      c("2022-01-01", "2022-01-03", "2022-01-05", "2022-01-06", "2022-01-07", "2022-02-01", "2022-02-01"))
  )

I would like to calculate sequence of dates, where difftime between first in sequence and last is less or equal 2 days.我想计算日期序列,其中第一个和最后一个之间的 difftime 小于或等于 2 天。 Once sequence reaches its last possible day, I would like to create seqeuences from all upcoming ones.一旦序列到达最后可能的一天,我想从所有即将到来的序列中创建序列。

In other words: Dataset and even sequence starts with 2022-01-01 so it will be marked by 0 - 2022-01-03 will be marked by 1 because it is part of sequence that started on 2022-01-01.换句话说:数据集甚至序列以 2022-01-01 开始,因此它将被标记为 0 - 2022-01-03 将被标记为 1,因为它是从 2022-01-01 开始的序列的一部分。

2022-01-05 can't be marked by 0 because difftime between 2022-01-01 and 2022-01-05 is greater than 2 days, this date is begining of new sequence and all upcoming dates where difftime is lower or equeal than 2 days (2022-01-06 and 2022-01-07) will be marked by 0. 2022-01-05 不能用 0 标记,因为 2022-01-01 和 2022-01-05 之间的 difftime 大于 2 天,这个日期是新序列的开始,所有即将到来的日期 difftime 低于或等于2 天(2022-01-06 和 2022-01-07)将以 0 标记。

Simiarly with 2022-02-01 (please note than there could be same dates in dataset).与 2022-02-01 类似(请注意,数据集中可能有相同的日期)。

I prefer dplyr solution, but if you can create another one, help I really appreciate your help.我更喜欢 dplyr 解决方案,但如果您可以创建另一个解决方案,请帮助我非常感谢您的帮助。

result <-
  data.frame(
    date = as.Date(
      c("2022-01-01", "2022-01-03", "2022-01-05", "2022-01-06", "2022-01-07", "2022-02-01", "2022-02-01")),
    flag = c(0, 1, 1, 0, 0, 1, 0)
  )

We may use diff to get the difference between adjacent 'date' and convert it to logical vector ( > ) and coerce the logical to binary with + or as.integer我们可以使用diff来获取相邻“日期”之间的差异并将其转换为逻辑向量 ( > ) 并使用+as.integer将逻辑强制转换为二进制

library(dplyr)
dt <- dt %>% 
   mutate(flag = +(c(0, diff(date) >  1)))

-output -输出

dt
       date flag
1 2022-01-01   0
2 2022-01-03   1
3 2022-01-05   1
4 2022-01-06   0
5 2022-01-07   0
6 2022-02-01   1
7 2022-02-01   0

Or with lag and difftime或者有lagdifftime

dt %>% 
 mutate(flag = +(difftime(date, lag(date, default = first(date)),
      units = "day") > 1))
        date flag
1 2022-01-01    0
2 2022-01-03    1
3 2022-01-05    1
4 2022-01-06    0
5 2022-01-07    0
6 2022-02-01    1
7 2022-02-01    0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM