简体   繁体   中英

Calculating difference between two dates grouped by a variable

I'm looking for some help writing more efficient code. I have the following data set.

Report| ReportPeriod|ObsDate
1     |     15      |2017-12-31 00:00:00
1     |     15      |2017-12-31 06:00:00
1     |     15      |2017-12-31 12:30:00
2     |     11      |2018-01-01 07:00:00
2     |     11      |2018-01-01 13:00:00
2     |     11      |2018-01-01 16:30:00

First column is "Report" which is a unique identifier for a particular report. In the data set, there are only two reports (1 & 2). Second column is "ReportPeriod", which is same for a particular report. Report 1 is 15 hrs long and Report 2 is 11 hrs long. Column three "ObsDate" is different observations in a particular report.

Problem: I need to find out the time difference between observations grouped by "Report". I did that with the following code.

example<- data.frame(Report=c(1,1,1,2,2,2), ReportPeriod=c(15,15,15,11,11,11),
                     ObsDate=c(as.POSIXct("2017-12-31 00:00:00"), as.POSIXct("2017-12-31 06:00:00"),
                               as.POSIXct("2017-12-31 12:30:00"), as.POSIXct("2018-01-01 07:00:00"),
                               as.POSIXct("2018-01-01 13:00:00"), as.POSIXct("2018-01-01 16:30:00")))

example<- example %>% group_by(Report) %>% 
  mutate(DiffPeriod= (ObsDate-lag(ObsDate)))

The output is:

Report| ReportPeriod|ObsDate            |DiffPeriod
1     |     15      |2017-12-31 00:00:00|NA
1     |     15      |2017-12-31 06:00:00|6.0
1     |     15      |2017-12-31 12:30:00|6.5
2     |     11      |2018-01-01 07:00:00|NA
2     |     11      |2018-01-01 13:00:00|6.0
2     |     11      |2018-01-01 16:30:00|3.5

Now the first two entries of the "Report" are NA. These values should be the sum of the DiffPeriod subtracted from the total report period "ReportPeriod".

I did that using the following code.

xyz<- data.frame()
for (i in unique(example$Report)) {
  df<- example %>% filter(Report==i)
  hrs<- sum(df$DiffPeriod, na.rm = TRUE)
  tot<- df$ReportPeriod[1]
  bal<- tot-hrs
  df$DiffPeriod[1]<- bal
  xyz<- xyz %>% bind_rows(df)
}

The final output is :

Report| ReportPeriod|ObsDate            |DiffPeriod
1     |     15      |2017-12-31 00:00:00|2.5
1     |     15      |2017-12-31 06:00:00|6.0
1     |     15      |2017-12-31 12:30:00|6.5
2     |     11      |2018-01-01 07:00:00|1.5
2     |     11      |2018-01-01 13:00:00|6.0
2     |     11      |2018-01-01 16:30:00|3.5

Is there a better/more efficient way to do what I did in the for-loop above in the tidyverse ?

Thanks.

Assuming ReportPeriod would always be in hours we can first get the difference between ObsDate and lag(ObsDate) and then replace NA which would be only first row by taking difference between first value of ReportPeriod with sum of DiffPeriod for each group ( Report ).

library(dplyr)

example %>% 
  group_by(Report) %>% 
  mutate(DiffPeriod= difftime(ObsDate, lag(ObsDate), units = "hours"), 
         DiffPeriod = replace(DiffPeriod, is.na(DiffPeriod), 
                      ReportPeriod[1] - sum(DiffPeriod, na.rm = TRUE)))


# Report ReportPeriod ObsDate             DiffPeriod
#   <dbl>        <dbl> <dttm>              <time>    
#1      1           15 2017-12-31 00:00:00 2.5 hours 
#2      1           15 2017-12-31 06:00:00 6.0 hours 
#3      1           15 2017-12-31 12:30:00 6.5 hours 
#4      2           11 2018-01-01 07:00:00 1.5 hours 
#5      2           11 2018-01-01 13:00:00 6.0 hours 
#6      2           11 2018-01-01 16:30:00 3.5 hours 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM