简体   繁体   English

跨多行数据的时间间隔

[英]Time intervals from data across multiple rows

I have a data structure similar to the one below:我有一个类似于下面的数据结构:

# A tibble: 5 x 4
  group  task start end  
  <chr> <dbl> <chr> <chr>
1 a         1 01:00 01:30
2 a         2 02:00 02:25
3 b         3 01:05 01:40
4 b         4 01:50 02:30
5 a         5 03:00 03:30

Basically i need to compute the time difference between the end of the last task and the start of the next one - for each group - given that it needs to be following a cronological order, and belong to the same group.基本上我需要计算最后一个任务结束和下一个任务开始之间的时间差 - 对于每个组 - 考虑到它需要遵循一个时间顺序,并且属于同一组。

Desired output:所需的 output:

# A tibble: 5 x 7
  group last_task last_end next_task next_start next_end interval
  <chr>     <dbl> <chr>        <dbl> <chr>      <chr>    <chr>   
1 a            NA NA               1 01:00      01:30    NA      
2 a             1 01:30            2 02:00      02:25    00:30   
3 b            NA NA               3 01:05      01:40    NA      
4 b             3 01:40            4 01:50      02:30    00:10   
5 a             2 02:25            5 03:00      03:30    00:35   

Here is an approach with lead and lag from dplyr .这是dplyrleadlag方法。

The output differs from your expected output, but I believe it matches your request in words because of grouping. output 与您预期的 output 不同,但我相信由于分组,它符合您的文字要求。

I use lubridate since your times are actually factors.我使用lubridate因为你的时间实际上是因素。 It will fail for tasks which cross dates.对于跨日期的任务,它将失败。

library(dplyr)
library(lubridate)
data %>%
  group_by(group) %>%
  arrange(task) %>%
  mutate(last_task = lag(task),
         last_end = lag(end),
         next_task = lead(task),
         next_start = lead(start),
         interval = ymd_hm(paste(today(),start,sep = " ")) - ymd_hm(paste(today(),lag(end),sep = " ")))
# A tibble: 5 x 9
  group  task start end   last_task last_end next_task next_start interval
  <fct> <int> <fct> <fct>     <int> <fct>        <int> <fct>      <drtn>  
1 a         1 01:00 01:30        NA NA               2 02:00      NA mins 
2 a         2 02:00 02:25         1 01:30            5 03:00      30 mins 
3 b         3 01:05 01:40        NA NA               4 01:50      NA mins 
4 b         4 01:50 02:30         3 01:40           NA NA         10 mins 
5 a         5 03:00 03:30         2 02:25           NA NA         35 mins 

If you're set on the interval format, we can hack that together:如果您设置了interval格式,我们可以一起破解:

data %>%
  group_by(group) %>%
  arrange(task) %>%
  mutate(last_task = lag(task),
         last_end = lag(end),
         next_task = lead(task),
         next_start = lead(start),
         interval = ymd_hm(paste(today(),start,sep = " ")) - ymd_hm(paste(today(),lag(end),sep = " ")),
         interval = ifelse(is.na(interval),NA,paste(hour(as.period(interval)),minute(as.period(interval)),sep = ":")))
# A tibble: 5 x 9
  group  task start end   last_task last_end next_task next_start interval
  <fct> <int> <fct> <fct>     <int> <fct>        <int> <fct>      <chr>   
1 a         1 01:00 01:30        NA NA               2 02:00      NA      
2 a         2 02:00 02:25         1 01:30            5 03:00      0:30    
3 b         3 01:05 01:40        NA NA               4 01:50      NA      
4 b         4 01:50 02:30         3 01:40           NA NA         0:10    
5 a         5 03:00 03:30         2 02:25           NA NA         0:35   

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM