简体   繁体   中英

Calculate duration of a time interval within a given period

I have a dataframe with start times and length (in seconds):

dates<-data.frame(start=as.POSIXct(c("2010-04-03 03:02:38 UTC","2010-04-03 06:03:14 UTC","2010-04-20 03:05:52 UTC","2010-04-20 03:17:42 UTC","2010-04-21 03:09:38 UTC","2010-04-21 07:10:14 UTC","2010-04-21 08:12:52 UTC","2010-04-23 03:13:42 UTC","2010-04-23 03:25:42 UTC","2010-04-23 03:36:38 UTC","2010-04-23 08:58:14 UTC","2010-04-24 03:21:52 UTC","2010-04-24 03:22:42 UTC","2010-04-24 07:24:19 UTC","2010-04-24 07:55:19 UTC")),length=c(3600,300,900,3600,300,900,3600,300,900,3600,300,900,3600,300,900))

> dates
                 start length
1  2010-04-03 03:02:38   3600
2  2010-04-03 06:03:14    300
3  2010-04-20 03:05:52    900
4  2010-04-20 03:17:42   3600
5  2010-04-21 03:09:38    300
6  2010-04-21 07:10:14    900
7  2010-04-21 08:12:52   3600
8  2010-04-23 03:13:42    300
9  2010-04-23 03:25:42    900
10 2010-04-23 03:36:38   3600
11 2010-04-23 08:58:14    300
12 2010-04-24 03:21:52    900
13 2010-04-24 03:22:42   3600
14 2010-04-24 07:24:19    300
15 2010-04-24 07:55:19    900

I need to find the total duration (length) for the period from 2010-04-02 00:00:00 to 2010-04-21 09:00:00, and for the period from 2010-04-23 03:15:00 to 2010-04-24 08:00:00.

The tricky part is that the given length can run past the end of the specified period and I don't want to count in that extra duration.

I expect to get:

  • 12428 seconds for 2010-04-02 00:00:00 to 2010-04-21 09:00:00
  • 10103 seconds for 2010-04-23 03:15:00 to 2010-04-24 08:00:00

I was thinking to use lubridate and define an interval for each row and then sum the durations, but I can't figure it out.

Not sure what was exactly being asked. The other answer simply sums length for start times within the interval specified. However, I had interpreted the question to want to deal with events where the length might run past the end of the specified period, and not count time past the end of the specified period (and vice versa for start times before the start of the period). For example, row 7 runs well past 2010-04-21 09:00:00. This is why providing expected output is helpful!

Regardless, here is a way to do what I had thought you meant wrapped in a function. Approach is basically to create a new start and end which is the edge of the specified interval if the event would run over. I may have missed some edge cases, improvements welcome!

dates<-data.frame(start=as.POSIXct(c("2010-04-03 03:02:38 UTC","2010-04-03 06:03:14 UTC","2010-04-20 03:05:52 UTC","2010-04-20 03:17:42 UTC","2010-04-21 03:09:38 UTC","2010-04-21 07:10:14 UTC","2010-04-21 08:12:52 UTC","2010-04-23 03:13:42 UTC","2010-04-23 03:25:42 UTC","2010-04-23 03:36:38 UTC","2010-04-23 08:58:14 UTC","2010-04-24 03:21:52 UTC","2010-04-24 03:22:42 UTC","2010-04-24 07:24:19 UTC","2010-04-24 07:55:19 UTC")),length=c(3600,300,900,3600,300,900,3600,300,900,3600,300,900,3600,300,900))
library(dplyr)
library(lubridate)

length_within <- function(tbl, interval_start, interval_end){
  intv_start = as.POSIXct(interval_start)
  intv_end = as.POSIXct(interval_end)
  tbl %>%
    mutate(
      end = start + length,
      counted_start = ifelse(start < intv_start, intv_start, start),
      counted_end = ifelse(end > intv_end, intv_end, end),
      seconds = counted_end - counted_start
    ) %>%
    filter(seconds >= 0) %>%
    summarise(total = sum(seconds)) %>%
    `[[` (1)
}

length_within(dates,"2010-04-02 00:00:00", "2010-04-21 09:00:00")
#> [1] 12428
length_within(dates,"2010-04-23 03:15:00", "2010-04-24 08:00:00")
#> [1] 10103

Another possible solution could be achieved by using first and last function from dplyr . The first and last function will allow us to just adjust sum of length for the only 1st and last rows.

library(dplyr)
calculate_duration <- function(df, start_time, end_time){
  start_time <- as.POSIXct(start_time)
  end_time <- as.POSIXct(end_time)

  df %>% filter((start+length) >= start_time & start < end_time) %>%
    arrange(start) %>% 
    summarise(last_time = last(start) + last(length),
       first_time = first(start) + first(length),
       sum = sum(length) - 
       ifelse(last_time > end_time, 
             difftime(last_time, end_time, units = 'secs'), 0L) -
       ifelse(first(start) <  start_time, 
             difftime(start_time, first(start), units = 'secs'), 0L) ) %>%
    select(sum)

}

calculate_duration(dates,"2010-04-02 00:00:00", "2010-04-21 09:00:00")
#    sum
#1 12428

calculate_duration(dates,"2010-04-23 03:15:00", "2010-04-24 08:00:00")
#    sum
#1 10103


# Data

dates<-data.frame(start=as.POSIXct(c("2010-04-03 03:02:38 UTC","2010-04-03 06:03:14 UTC",
"2010-04-20 03:05:52 UTC","2010-04-20 03:17:42 UTC","2010-04-21 03:09:38 UTC",
"2010-04-21 07:10:14 UTC","2010-04-21 08:12:52 UTC","2010-04-23 03:13:42 UTC",
"2010-04-23 03:25:42 UTC","2010-04-23 03:36:38 UTC","2010-04-23 08:58:14 UTC",
"2010-04-24 03:21:52 UTC","2010-04-24 03:22:42 UTC","2010-04-24 07:24:19 UTC",
"2010-04-24 07:55:19 UTC")),
length=c(3600,300,900,3600,300,900,3600,300,900,3600,300,900,3600,300,900))

Here is an example:

library(lubridate)

t0 <- as.POSIXct('2010-04-02 00:00:00')
t1 <- as.POSIXct('2010-04-21 09:00:00')

sum(dates$length[dates$start %within% interval(t0,t1)])
# [1] 13200

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM