[英]Calculate maximum date interval - R
The challenge is a data.frame with with one group variable ( id
) and two date variables ( start
and stop
). 面临的挑战是具有一个组变量( id
)和两个日期变量( start
和stop
)的data.frame。 The date intervals are irregular and I'm trying to calculate the uninterrupted interval in days starting from the first start
date per group. 日期间隔是不规则的,我正在尝试计算从每个组的第一个start
日期start
天数。
Example data: 示例数据:
data <- data.frame(
id = c(1, 2, 2, 3, 3, 3, 3, 3, 4, 5),
start = as.Date(c("2016-02-18", "2016-12-07", "2016-12-12", "2015-04-10",
"2015-04-12", "2015-04-14", "2015-05-15", "2015-07-14",
"2010-12-08", "2011-03-09")),
stop = as.Date(c("2016-02-19", "2016-12-12", "2016-12-13", "2015-04-13",
"2015-04-22", "2015-05-13", "2015-07-13", "2015-07-15",
"2010-12-10", "2011-03-11"))
)
> data
id start stop
1 1 2016-02-18 2016-02-19
2 2 2016-12-07 2016-12-12
3 2 2016-12-12 2016-12-13
4 3 2015-04-10 2015-04-13
5 3 2015-04-12 2015-04-22
6 3 2015-04-14 2015-05-13
7 3 2015-05-15 2015-07-13
8 3 2015-07-14 2015-07-15
9 4 2010-12-08 2010-12-10
10 5 2011-03-09 2011-03-11
The aim would a data.frame like this: 目标是这样的data.frame:
id start stop duration_from_start
1 1 2016-02-18 2016-02-19 2
2 2 2016-12-07 2016-12-12 7
3 2 2016-12-12 2016-12-13 7
4 3 2015-04-10 2015-04-13 34
5 3 2015-04-12 2015-04-22 34
6 3 2015-04-14 2015-05-13 34
7 3 2015-05-15 2015-07-13 34
8 3 2015-07-14 2015-07-15 34
9 4 2010-12-08 2010-12-10 3
10 5 2011-03-09 2011-03-11 3
Or this: 或这个:
id start stop duration_from_start
1 1 2016-02-18 2016-02-19 2
2 2 2016-12-07 2016-12-13 7
3 3 2015-04-10 2015-05-13 34
4 4 2010-12-08 2010-12-10 3
5 5 2011-03-09 2011-03-11 3
It's important to identify the gap from row 6
to 7
and to take this point as the maximum interval ( 34
days). 确定从第6
行到第7
行的间隔并将此点作为最大间隔( 34
天)非常重要。 The interval 2018-10-01
to 2018-10-01
would be counted as 1
. 从2018-10-01
到2018-10-01
的时间间隔将被计为1
。
My usual lubridate
approaches don't work with this example ( interval %within lag(interval)
). 我通常使用的lubridate
方法在该示例中不起作用( interval %within lag(interval)
)。
Any idea? 任何想法?
library(magrittr)
library(data.table)
setDT(data)
first_int <- function(start, stop){
ind <- rleid((start - shift(stop, fill = Inf)) > 0) == 1
list(start = min(start[ind]),
stop = max(stop[ind]))
}
newdata <-
data[, first_int(start, stop), by = id] %>%
.[, duration := stop - start + 1]
# id start stop duration
# 1: 1 2016-02-18 2016-02-19 2 days
# 2: 2 2016-12-07 2016-12-13 7 days
# 3: 3 2015-04-10 2015-05-13 34 days
# 4: 4 2010-12-08 2010-12-10 3 days
# 5: 5 2011-03-09 2011-03-11 3 days
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.