简体   繁体   中英

Find duration by hour

I have a following dataframe (length is in seconds):

dates<-data.frame(start=as.POSIXct(c("2010-04-03 03:02:38 UTC","2010-04-03 06:03:14 UTC","2010-04-20 03:05:52 UTC","2010-04-20 03:17:42 UTC","2010-04-21 03:09:38 UTC","2010-04-21 07:10:14 UTC","2010-04-21 08:12:52 UTC","2010-04-23 03:13:42 UTC","2010-04-23 03:25:42 UTC","2010-04-23 03:36:38 UTC","2010-04-23 08:58:14 UTC","2010-04-24 03:21:52 UTC","2010-04-24 03:22:42 UTC","2010-04-24 07:24:19 UTC","2010-04-24 07:55:19 UTC")),length=c(3600,300,900,3600,300,900,3600,300,900,3600,300,900,3600,300,3600))

> dates
                 start length
1  2010-04-03 03:02:38   3600
2  2010-04-03 06:03:14    300
3  2010-04-20 03:05:52    900
4  2010-04-20 03:17:42   3600
5  2010-04-21 03:09:38    300
6  2010-04-21 07:10:14    900
7  2010-04-21 08:12:52   3600
8  2010-04-23 03:13:42    300
9  2010-04-23 03:25:42    900
10 2010-04-23 03:36:38   3600
11 2010-04-23 08:58:14    300
12 2010-04-24 03:21:52    900
13 2010-04-24 03:22:42   3600
14 2010-04-24 07:24:19    300
15 2010-04-24 07:55:19   3600

I want to calculate the total duration by hour, eg from 00:00:00 to 01:00:00, from 01:00:00 to 02:00:00 and so on. But sometimes the start is at 07:55:19 and the duration is 3600 (like in the last row) and I need to split it into 2 and count 281 sec for 07:00:00 to 08:00:00 period, and 3319 sec for 08:00:00 to 09:00:00 period.

I would find the total duration for 03:00:00-04:00:00 period like:

library(lubridate)

dates$endTime<-dates$start+dates$length
dates$newTime<-format(dates$start, format="%H:%M:%S")
dates$endTime<-format(dates$endTime, format="%H:%M:%S")
dates$dur3<-ifelse(hms(dates$endTime)<hms("04:00:00"), seconds(hms(dates$endTime)-hms(dates$newTime)), seconds(hms("04:00:00")-hms(dates$newTime)))

sum(dates[dates$dur3>0,"dur3"])
12920

I was thinking to just calculate the duration within each of the 24 periods for each row, and then just sum those, but what would be a more efficient way to do this?

Here's my take on the question, even though I'm not entirely certain of the task: First, I calculate the overlap into the next hours

dates$rest <- 3600 - as.numeric(format(dates$start, "%M"))*60 - as.numeric(format(dates$start, "%S"))
dates$excess <- dates$length - dates$rest

Next, we loop over those lengths extending into the next hour, bearing in mind that this only works if the lengths are limited by 3600. as in the example. If not, the loop needs to be extended a bit.

for(row in which(dates$excess > 0)){
  row_to_copy <- dates[row, ]
  dates[row, "length"] <- dates[row, "length"] - row_to_copy$excess
  row_to_copy$start <- row_to_copy$start + 3600
  row_to_copy$length <- row_to_copy$excess
  dates <-rbind(dates, row_to_copy)
}

With the finished data set, we now define the column by which to group the hours. Note that we could also group by "Date - Hour" if we wish.

dates$hours <- format(dates$start, "%H")
res_df <-
  dates %>% 
  group_by(hours) %>%
  summarize(length_total = sum(length))

With the result

> res_df
# A tibble: 6 x 2
  hours length_total
  <chr>        <dbl>
1 03           13240
2 04            4460
3 06             300
4 07            1519
5 08            6347
6 09             834
> a=dates$start
> b=difftime(a+hours(1)-second(a)-minutes(minute(a)),a,units="secs")
> d=c(pmin(b,dates$length),replace(e<-dates$length-b,e<0,0))
> tapply(d,c(hour(a),hour(a)+1),sum)
    3     4     6     7     8     9 
12920  4780   300  1481  6253   966 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM