简体   繁体   English

计算按变量分组的两个日期之间的差异

[英]Calculating difference between two dates grouped by a variable

I'm looking for some help writing more efficient code. 我正在寻找一些帮助编写更有效的代码。 I have the following data set. 我有以下数据集。

Report| ReportPeriod|ObsDate
1     |     15      |2017-12-31 00:00:00
1     |     15      |2017-12-31 06:00:00
1     |     15      |2017-12-31 12:30:00
2     |     11      |2018-01-01 07:00:00
2     |     11      |2018-01-01 13:00:00
2     |     11      |2018-01-01 16:30:00

First column is "Report" which is a unique identifier for a particular report. 第一列是“报告”,它是特定报告的唯一标识符。 In the data set, there are only two reports (1 & 2). 在数据集中,只有两个报告(1和2)。 Second column is "ReportPeriod", which is same for a particular report. 第二列是“ ReportPeriod”,它与特定报告相同。 Report 1 is 15 hrs long and Report 2 is 11 hrs long. 报告1为15小时,报告2为11小时。 Column three "ObsDate" is different observations in a particular report. 第三列“ ObsDate”是特定报告中的不同观察结果。

Problem: I need to find out the time difference between observations grouped by "Report". 问题:我需要找出按“报告”分组的观察之间的时间差。 I did that with the following code. 我用以下代码做到了这一点。

example<- data.frame(Report=c(1,1,1,2,2,2), ReportPeriod=c(15,15,15,11,11,11),
                     ObsDate=c(as.POSIXct("2017-12-31 00:00:00"), as.POSIXct("2017-12-31 06:00:00"),
                               as.POSIXct("2017-12-31 12:30:00"), as.POSIXct("2018-01-01 07:00:00"),
                               as.POSIXct("2018-01-01 13:00:00"), as.POSIXct("2018-01-01 16:30:00")))

example<- example %>% group_by(Report) %>% 
  mutate(DiffPeriod= (ObsDate-lag(ObsDate)))

The output is: 输出为:

Report| ReportPeriod|ObsDate            |DiffPeriod
1     |     15      |2017-12-31 00:00:00|NA
1     |     15      |2017-12-31 06:00:00|6.0
1     |     15      |2017-12-31 12:30:00|6.5
2     |     11      |2018-01-01 07:00:00|NA
2     |     11      |2018-01-01 13:00:00|6.0
2     |     11      |2018-01-01 16:30:00|3.5

Now the first two entries of the "Report" are NA. 现在,“报告”的前两个条目为NA。 These values should be the sum of the DiffPeriod subtracted from the total report period "ReportPeriod". 这些值应为DiffPeriod的总和,该总和应从总报告期间“ ReportPeriod”中减去。

I did that using the following code. 我使用以下代码做到了这一点。

xyz<- data.frame()
for (i in unique(example$Report)) {
  df<- example %>% filter(Report==i)
  hrs<- sum(df$DiffPeriod, na.rm = TRUE)
  tot<- df$ReportPeriod[1]
  bal<- tot-hrs
  df$DiffPeriod[1]<- bal
  xyz<- xyz %>% bind_rows(df)
}

The final output is : 最终输出为:

Report| ReportPeriod|ObsDate            |DiffPeriod
1     |     15      |2017-12-31 00:00:00|2.5
1     |     15      |2017-12-31 06:00:00|6.0
1     |     15      |2017-12-31 12:30:00|6.5
2     |     11      |2018-01-01 07:00:00|1.5
2     |     11      |2018-01-01 13:00:00|6.0
2     |     11      |2018-01-01 16:30:00|3.5

Is there a better/more efficient way to do what I did in the for-loop above in the tidyverse ? 有没有更好/更有效的方法来完成我在上述tidyverse中的for循环中tidyverse

Thanks. 谢谢。

Assuming ReportPeriod would always be in hours we can first get the difference between ObsDate and lag(ObsDate) and then replace NA which would be only first row by taking difference between first value of ReportPeriod with sum of DiffPeriod for each group ( Report ). 假设ReportPeriod将始终以小时为单位,我们可以先拿到的区别ObsDatelag(ObsDate)然后replace NA这将是唯一的第一行采取的第一个值之间的差异ReportPeriodsumDiffPeriod为每个组( Report )。

library(dplyr)

example %>% 
  group_by(Report) %>% 
  mutate(DiffPeriod= difftime(ObsDate, lag(ObsDate), units = "hours"), 
         DiffPeriod = replace(DiffPeriod, is.na(DiffPeriod), 
                      ReportPeriod[1] - sum(DiffPeriod, na.rm = TRUE)))


# Report ReportPeriod ObsDate             DiffPeriod
#   <dbl>        <dbl> <dttm>              <time>    
#1      1           15 2017-12-31 00:00:00 2.5 hours 
#2      1           15 2017-12-31 06:00:00 6.0 hours 
#3      1           15 2017-12-31 12:30:00 6.5 hours 
#4      2           11 2018-01-01 07:00:00 1.5 hours 
#5      2           11 2018-01-01 13:00:00 6.0 hours 
#6      2           11 2018-01-01 16:30:00 3.5 hours 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM