Calculating difference between two dates grouped by a variable

Question

I'm looking for some help writing more efficient code. I have the following data set.

Report| ReportPeriod|ObsDate
1     |     15      |2017-12-31 00:00:00
1     |     15      |2017-12-31 06:00:00
1     |     15      |2017-12-31 12:30:00
2     |     11      |2018-01-01 07:00:00
2     |     11      |2018-01-01 13:00:00
2     |     11      |2018-01-01 16:30:00

First column is "Report" which is a unique identifier for a particular report. In the data set, there are only two reports (1 & 2). Second column is "ReportPeriod", which is same for a particular report. Report 1 is 15 hrs long and Report 2 is 11 hrs long. Column three "ObsDate" is different observations in a particular report.

Problem: I need to find out the time difference between observations grouped by "Report". I did that with the following code.

example<- data.frame(Report=c(1,1,1,2,2,2), ReportPeriod=c(15,15,15,11,11,11),
                     ObsDate=c(as.POSIXct("2017-12-31 00:00:00"), as.POSIXct("2017-12-31 06:00:00"),
                               as.POSIXct("2017-12-31 12:30:00"), as.POSIXct("2018-01-01 07:00:00"),
                               as.POSIXct("2018-01-01 13:00:00"), as.POSIXct("2018-01-01 16:30:00")))

example<- example %>% group_by(Report) %>% 
  mutate(DiffPeriod= (ObsDate-lag(ObsDate)))

The output is:

Report| ReportPeriod|ObsDate            |DiffPeriod
1     |     15      |2017-12-31 00:00:00|NA
1     |     15      |2017-12-31 06:00:00|6.0
1     |     15      |2017-12-31 12:30:00|6.5
2     |     11      |2018-01-01 07:00:00|NA
2     |     11      |2018-01-01 13:00:00|6.0
2     |     11      |2018-01-01 16:30:00|3.5

Now the first two entries of the "Report" are NA. These values should be the sum of the DiffPeriod subtracted from the total report period "ReportPeriod".

I did that using the following code.

xyz<- data.frame()
for (i in unique(example$Report)) {
  df<- example %>% filter(Report==i)
  hrs<- sum(df$DiffPeriod, na.rm = TRUE)
  tot<- df$ReportPeriod[1]
  bal<- tot-hrs
  df$DiffPeriod[1]<- bal
  xyz<- xyz %>% bind_rows(df)
}

The final output is :

Report| ReportPeriod|ObsDate            |DiffPeriod
1     |     15      |2017-12-31 00:00:00|2.5
1     |     15      |2017-12-31 06:00:00|6.0
1     |     15      |2017-12-31 12:30:00|6.5
2     |     11      |2018-01-01 07:00:00|1.5
2     |     11      |2018-01-01 13:00:00|6.0
2     |     11      |2018-01-01 16:30:00|3.5

Is there a better/more efficient way to do what I did in the for-loop above in the tidyverse ?

Thanks.

Answer 1

Assuming ReportPeriod would always be in hours we can first get the difference between ObsDate and lag(ObsDate) and then replace NA which would be only first row by taking difference between first value of ReportPeriod with sum of DiffPeriod for each group ( Report ).

library(dplyr)

example %>% 
  group_by(Report) %>% 
  mutate(DiffPeriod= difftime(ObsDate, lag(ObsDate), units = "hours"), 
         DiffPeriod = replace(DiffPeriod, is.na(DiffPeriod), 
                      ReportPeriod[1] - sum(DiffPeriod, na.rm = TRUE)))


# Report ReportPeriod ObsDate             DiffPeriod
#   <dbl>        <dbl> <dttm>              <time>    
#1      1           15 2017-12-31 00:00:00 2.5 hours 
#2      1           15 2017-12-31 06:00:00 6.0 hours 
#3      1           15 2017-12-31 12:30:00 6.5 hours 
#4      2           11 2018-01-01 07:00:00 1.5 hours 
#5      2           11 2018-01-01 13:00:00 6.0 hours 
#6      2           11 2018-01-01 16:30:00 3.5 hours

Calculating difference between two dates grouped by a variable

Question

1 answers

solution1
2 ACCPTED 2019-05-01 11:02:30

Calculating difference between two dates grouped by a variable

Question

1 answers

solution1 2 ACCPTED 2019-05-01 11:02:30

solution1
2 ACCPTED 2019-05-01 11:02:30