[英]R dplyr cumulative difftime with condition
lets say I have dataframe like this:假设我有这样的数据框:
dt <-
data.frame(
date = as.Date(
c("2022-01-01", "2022-01-03", "2022-01-05", "2022-01-06", "2022-01-07", "2022-02-01", "2022-02-01"))
)
I would like to calculate sequence of dates, where difftime between first in sequence and last is less or equal 2 days.我想计算日期序列,其中第一个和最后一个之间的 difftime 小于或等于 2 天。 Once sequence reaches its last possible day, I would like to create seqeuences from all upcoming ones.
一旦序列到达最后可能的一天,我想从所有即将到来的序列中创建序列。
In other words: Dataset and even sequence starts with 2022-01-01 so it will be marked by 0 - 2022-01-03 will be marked by 1 because it is part of sequence that started on 2022-01-01.换句话说:数据集甚至序列以 2022-01-01 开始,因此它将被标记为 0 - 2022-01-03 将被标记为 1,因为它是从 2022-01-01 开始的序列的一部分。
2022-01-05 can't be marked by 0 because difftime between 2022-01-01 and 2022-01-05 is greater than 2 days, this date is begining of new sequence and all upcoming dates where difftime is lower or equeal than 2 days (2022-01-06 and 2022-01-07) will be marked by 0. 2022-01-05 不能用 0 标记,因为 2022-01-01 和 2022-01-05 之间的 difftime 大于 2 天,这个日期是新序列的开始,所有即将到来的日期 difftime 低于或等于2 天(2022-01-06 和 2022-01-07)将以 0 标记。
Simiarly with 2022-02-01 (please note than there could be same dates in dataset).与 2022-02-01 类似(请注意,数据集中可能有相同的日期)。
I prefer dplyr solution, but if you can create another one, help I really appreciate your help.我更喜欢 dplyr 解决方案,但如果您可以创建另一个解决方案,请帮助我非常感谢您的帮助。
result <-
data.frame(
date = as.Date(
c("2022-01-01", "2022-01-03", "2022-01-05", "2022-01-06", "2022-01-07", "2022-02-01", "2022-02-01")),
flag = c(0, 1, 1, 0, 0, 1, 0)
)
We may use diff
to get the difference between adjacent 'date' and convert it to logical vector ( >
) and coerce the logical to binary with +
or as.integer
我们可以使用
diff
来获取相邻“日期”之间的差异并将其转换为逻辑向量 ( >
) 并使用+
或as.integer
将逻辑强制转换为二进制
library(dplyr)
dt <- dt %>%
mutate(flag = +(c(0, diff(date) > 1)))
-output -输出
dt
date flag
1 2022-01-01 0
2 2022-01-03 1
3 2022-01-05 1
4 2022-01-06 0
5 2022-01-07 0
6 2022-02-01 1
7 2022-02-01 0
Or with lag
and difftime
或者有
lag
和difftime
dt %>%
mutate(flag = +(difftime(date, lag(date, default = first(date)),
units = "day") > 1))
date flag
1 2022-01-01 0
2 2022-01-03 1
3 2022-01-05 1
4 2022-01-06 0
5 2022-01-07 0
6 2022-02-01 1
7 2022-02-01 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.