[英]Time intervals from data across multiple rows
I have a data structure similar to the one below:我有一个类似于下面的数据结构:
# A tibble: 5 x 4
group task start end
<chr> <dbl> <chr> <chr>
1 a 1 01:00 01:30
2 a 2 02:00 02:25
3 b 3 01:05 01:40
4 b 4 01:50 02:30
5 a 5 03:00 03:30
Basically i need to compute the time difference between the end of the last task and the start of the next one - for each group - given that it needs to be following a cronological order, and belong to the same group.基本上我需要计算最后一个任务结束和下一个任务开始之间的时间差 - 对于每个组 - 考虑到它需要遵循一个时间顺序,并且属于同一组。
Desired output:所需的 output:
# A tibble: 5 x 7
group last_task last_end next_task next_start next_end interval
<chr> <dbl> <chr> <dbl> <chr> <chr> <chr>
1 a NA NA 1 01:00 01:30 NA
2 a 1 01:30 2 02:00 02:25 00:30
3 b NA NA 3 01:05 01:40 NA
4 b 3 01:40 4 01:50 02:30 00:10
5 a 2 02:25 5 03:00 03:30 00:35
Here is an approach with lead
and lag
from dplyr
.这是
dplyr
的lead
和lag
方法。
The output differs from your expected output, but I believe it matches your request in words because of grouping. output 与您预期的 output 不同,但我相信由于分组,它符合您的文字要求。
I use lubridate
since your times are actually factors.我使用
lubridate
因为你的时间实际上是因素。 It will fail for tasks which cross dates.对于跨日期的任务,它将失败。
library(dplyr)
library(lubridate)
data %>%
group_by(group) %>%
arrange(task) %>%
mutate(last_task = lag(task),
last_end = lag(end),
next_task = lead(task),
next_start = lead(start),
interval = ymd_hm(paste(today(),start,sep = " ")) - ymd_hm(paste(today(),lag(end),sep = " ")))
# A tibble: 5 x 9
group task start end last_task last_end next_task next_start interval
<fct> <int> <fct> <fct> <int> <fct> <int> <fct> <drtn>
1 a 1 01:00 01:30 NA NA 2 02:00 NA mins
2 a 2 02:00 02:25 1 01:30 5 03:00 30 mins
3 b 3 01:05 01:40 NA NA 4 01:50 NA mins
4 b 4 01:50 02:30 3 01:40 NA NA 10 mins
5 a 5 03:00 03:30 2 02:25 NA NA 35 mins
If you're set on the interval
format, we can hack that together:如果您设置了
interval
格式,我们可以一起破解:
data %>%
group_by(group) %>%
arrange(task) %>%
mutate(last_task = lag(task),
last_end = lag(end),
next_task = lead(task),
next_start = lead(start),
interval = ymd_hm(paste(today(),start,sep = " ")) - ymd_hm(paste(today(),lag(end),sep = " ")),
interval = ifelse(is.na(interval),NA,paste(hour(as.period(interval)),minute(as.period(interval)),sep = ":")))
# A tibble: 5 x 9
group task start end last_task last_end next_task next_start interval
<fct> <int> <fct> <fct> <int> <fct> <int> <fct> <chr>
1 a 1 01:00 01:30 NA NA 2 02:00 NA
2 a 2 02:00 02:25 1 01:30 5 03:00 0:30
3 b 3 01:05 01:40 NA NA 4 01:50 NA
4 b 4 01:50 02:30 3 01:40 NA NA 0:10
5 a 5 03:00 03:30 2 02:25 NA NA 0:35
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.