[英]Group, then create a 'break' if the datetime exceeds a certain time, creating a new value within original grouped column (R, dplyr)
I have dataset, df,我有数据集,df,
Subject Folder Message Date
A Out 9/9/2019 5:46:38 PM
A Out 9/9/2019 5:46:40 PM
A Out 9/9/2019 5:46:42 PM
A Out 9/9/2019 5:46:43 PM
A Out 9/9/2019 9:30:00 PM
A Out 9/9/2019 9:30:01 PM
B Out 9/9/2019 9:35:00 PM
B Out 9/9/2019 9:35:01 PM
I am trying to group this by Subject, find the duration, and create a new Duration column.我正在尝试按主题对其进行分组,找到持续时间,并创建一个新的持续时间列。 I also wish to create a threshold if the Date time exceeds a certain amount of time.如果日期时间超过一定时间,我还希望创建一个阈值。 My dilemma is that within Group A, the time goes from 5:46 in the 4th row to 9:30 in the 5th row.我的困境是在A组中,时间从第4排的5:46到第5排的9:30。 This gives an inaccurate duration in Group A. I wish to 'break' that time and find the new time duration while creating a new value (A1) in the Subject when the time exceeds 10 minutes.这在 A 组中给出了不准确的持续时间。我希望“打破”那个时间并找到新的持续时间,同时在时间超过 10 分钟时在主题中创建新值 (A1)。 I am not sure if I should use a loop for this?我不确定是否应该为此使用循环?
Subject Duration Group
A 5 sec outdata1
A1 1 sec outdata2
B 1 sec outdata3
Here is my dput:这是我的 dput:
structure(list(Subject = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L), .Label = c("A", "B"), class = "factor"), Folder = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "Out", class = "factor"),
Message = c("", "", "", "", "", "", "", ""), Date = structure(1:8, .Label = c("9/9/2019 5:46:38 PM",
"9/9/2019 5:46:40 PM", "9/9/2019 5:46:42 PM", "9/9/2019 5:46:43 PM",
"9/9/2019 9:30:00 PM", "9/9/2019 9:30:01 PM", "9/9/2019 9:35:00 PM",
"9/9/2019 9:35:01 PM"), class = "factor")), row.names = c(NA,
-8L), class = "data.frame")
This is what I tried:这是我尝试过的:
thresh <- duration(10, units = "minutes")
df %>%
mutate(Date = mdy_hms(Date)) %>%
transmute(Subject, Duration = diff = difftime(as.POSIXct(Date, format =
"%m/%d/%Y %I:%M:%S %p"),as.POSIXct(Date,
format = "%m/%d/%Y %I:%M:%S %p" ), units = "secs")) %>%
ungroup %>%
distinct %>%
mutate(grp = str_c("Outdata", row_number()))
mutate(delta = if_else(grp < thresh1, grp, NA_real_))
We can calculate the duration between consecutive Date
values to create new group and then calculate the difference in time between min
and max
in each group.我们可以计算连续Date
值之间的持续时间以创建新组,然后计算每个组中min
和max
之间的时间差。
library(dplyr)
thresh <- 10
df %>%
mutate(Date = as.POSIXct(Date, format = "%m/%d/%Y %I:%M:%S %p")) %>%
group_by(Subject, Group = cumsum(difftime(Date,
lag(Date, default = first(Date)), units = "mins") > thresh)) %>%
summarise(Duration = difftime(max(Date), min(Date), units = "secs")) %>%
ungroup %>%
mutate(Group = paste0('outdata', row_number()))
# A tibble: 3 x 3
# Subject Group Duration
# <fct> <chr> <drtn>
#1 A outdata1 5 secs
#2 A outdata2 1 secs
#3 B outdata3 1 secs
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.