简体   繁体   English

计算最大日期间隔-R

[英]Calculate maximum date interval - R

The challenge is a data.frame with with one group variable ( id ) and two date variables ( start and stop ). 面临的挑战是具有一个组变量( id )和两个日期变量( startstop )的data.frame。 The date intervals are irregular and I'm trying to calculate the uninterrupted interval in days starting from the first start date per group. 日期间隔是不规则的,我正在尝试计算从每个组的第一个start日期start天数。

Example data: 示例数据:

data <- data.frame(
  id = c(1, 2, 2, 3, 3, 3, 3, 3, 4, 5),
  start = as.Date(c("2016-02-18", "2016-12-07", "2016-12-12", "2015-04-10", 
                    "2015-04-12", "2015-04-14", "2015-05-15", "2015-07-14", 
                    "2010-12-08", "2011-03-09")),
  stop = as.Date(c("2016-02-19", "2016-12-12", "2016-12-13", "2015-04-13", 
                   "2015-04-22", "2015-05-13", "2015-07-13", "2015-07-15", 
                   "2010-12-10", "2011-03-11"))
)

> data
   id      start       stop
1   1 2016-02-18 2016-02-19
2   2 2016-12-07 2016-12-12
3   2 2016-12-12 2016-12-13
4   3 2015-04-10 2015-04-13
5   3 2015-04-12 2015-04-22
6   3 2015-04-14 2015-05-13
7   3 2015-05-15 2015-07-13
8   3 2015-07-14 2015-07-15
9   4 2010-12-08 2010-12-10
10  5 2011-03-09 2011-03-11

The aim would a data.frame like this: 目标是这样的data.frame:

   id      start       stop duration_from_start
1   1 2016-02-18 2016-02-19                   2
2   2 2016-12-07 2016-12-12                   7
3   2 2016-12-12 2016-12-13                   7
4   3 2015-04-10 2015-04-13                  34
5   3 2015-04-12 2015-04-22                  34
6   3 2015-04-14 2015-05-13                  34
7   3 2015-05-15 2015-07-13                  34
8   3 2015-07-14 2015-07-15                  34
9   4 2010-12-08 2010-12-10                   3
10  5 2011-03-09 2011-03-11                   3

Or this: 或这个:

  id      start       stop duration_from_start
1  1 2016-02-18 2016-02-19                   2
2  2 2016-12-07 2016-12-13                   7
3  3 2015-04-10 2015-05-13                  34
4  4 2010-12-08 2010-12-10                   3
5  5 2011-03-09 2011-03-11                   3

It's important to identify the gap from row 6 to 7 and to take this point as the maximum interval ( 34 days). 确定从第6行到第7行的间隔并将此点作为最大间隔( 34天)非常重要。 The interval 2018-10-01 to 2018-10-01 would be counted as 1 . 2018-10-012018-10-01的时间间隔将被计为1

My usual lubridate approaches don't work with this example ( interval %within lag(interval) ). 我通常使用的lubridate方法在该示例中不起作用( interval %within lag(interval) )。

Any idea? 任何想法?

library(magrittr)
library(data.table)
setDT(data)

first_int <- function(start, stop){
  ind <- rleid((start - shift(stop, fill = Inf)) > 0) == 1
  list(start = min(start[ind]),
       stop  = max(stop[ind]))
}

newdata <- 
  data[, first_int(start, stop), by = id] %>% 
     .[, duration := stop - start + 1]


#    id      start       stop duration
# 1:  1 2016-02-18 2016-02-19   2 days
# 2:  2 2016-12-07 2016-12-13   7 days
# 3:  3 2015-04-10 2015-05-13  34 days
# 4:  4 2010-12-08 2010-12-10   3 days
# 5:  5 2011-03-09 2011-03-11   3 days

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM