简体   繁体   English

分组,然后在日期时间超过特定时间时创建一个“中断”,在原始分组列(R,dplyr)中创建一个新值

[英]Group, then create a 'break' if the datetime exceeds a certain time, creating a new value within original grouped column (R, dplyr)

I have dataset, df,我有数据集,df,

  Subject      Folder     Message    Date
  A            Out                   9/9/2019 5:46:38 PM
  A            Out                   9/9/2019 5:46:40 PM
  A            Out                   9/9/2019 5:46:42 PM
  A            Out                   9/9/2019 5:46:43 PM
  A            Out                   9/9/2019 9:30:00 PM
  A            Out                   9/9/2019 9:30:01 PM
  B            Out                   9/9/2019 9:35:00 PM
  B            Out                   9/9/2019 9:35:01 PM

I am trying to group this by Subject, find the duration, and create a new Duration column.我正在尝试按主题对其进行分组,找到持续时间,并创建一个新的持续时间列。 I also wish to create a threshold if the Date time exceeds a certain amount of time.如果日期时间超过一定时间,我还希望创建一个阈值。 My dilemma is that within Group A, the time goes from 5:46 in the 4th row to 9:30 in the 5th row.我的困境是在A组中,时间从第4排的5:46到第5排的9:30。 This gives an inaccurate duration in Group A. I wish to 'break' that time and find the new time duration while creating a new value (A1) in the Subject when the time exceeds 10 minutes.这在 A 组中给出了不准确的持续时间。我希望“打破”那个时间并找到新的持续时间,同时在时间超过 10 分钟时在主题中创建新值 (A1)。 I am not sure if I should use a loop for this?我不确定是否应该为此使用循环?

 Subject   Duration   Group
 A         5 sec      outdata1
 A1        1 sec      outdata2
 B         1 sec      outdata3

Here is my dput:这是我的 dput:

structure(list(Subject = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
2L, 2L), .Label = c("A", "B"), class = "factor"), Folder = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "Out", class = "factor"), 
Message = c("", "", "", "", "", "", "", ""), Date = structure(1:8, .Label = c("9/9/2019 5:46:38 PM", 
"9/9/2019 5:46:40 PM", "9/9/2019 5:46:42 PM", "9/9/2019 5:46:43 PM", 
"9/9/2019 9:30:00 PM", "9/9/2019 9:30:01 PM", "9/9/2019 9:35:00 PM", 
"9/9/2019 9:35:01 PM"), class = "factor")), row.names = c(NA, 
-8L), class = "data.frame")

This is what I tried:这是我尝试过的:

thresh <- duration(10, units = "minutes")

df %>%  
mutate(Date = mdy_hms(Date)) %>% 
transmute(Subject, Duration = diff = difftime(as.POSIXct(Date, format = 
"%m/%d/%Y %I:%M:%S %p"),as.POSIXct(Date, 
format = "%m/%d/%Y %I:%M:%S %p" ), units = "secs")) %>% 
ungroup %>% 
distinct %>% 
mutate(grp = str_c("Outdata", row_number()))

 mutate(delta = if_else(grp < thresh1, grp, NA_real_))

We can calculate the duration between consecutive Date values to create new group and then calculate the difference in time between min and max in each group.我们可以计算连续Date值之间的持续时间以创建新组,然后计算每个组中minmax之间的时间差。

library(dplyr)
thresh <- 10

df %>%  
  mutate(Date = as.POSIXct(Date, format = "%m/%d/%Y %I:%M:%S %p")) %>%
  group_by(Subject, Group = cumsum(difftime(Date, 
            lag(Date, default = first(Date)), units = "mins") > thresh)) %>%
  summarise(Duration = difftime(max(Date), min(Date), units = "secs")) %>%
  ungroup %>%
  mutate(Group = paste0('outdata', row_number()))

# A tibble: 3 x 3
#  Subject Group    Duration
#  <fct>   <chr>    <drtn>  
#1 A       outdata1 5 secs  
#2 A       outdata2 1 secs  
#3 B       outdata3 1 secs  

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如果时间超过一定数量,则创建新部分并获取时间差(R,Dplyr) - Create new section and take time difference if time exceeds a certain amount (R, Dplyr) 检查日期时间中的前一行,如果时间大于某个值,则放入一个组并以秒为单位获取其持续时间(R,dplyr,lubridate) - Check previous row in datetime, if time is greater than a certain value, place in a group and take its duration in seconds (R, dplyr, lubridate) 使用 Dplyr/R 创建包含组内所有行索引的向量的列 - Create column with vector with all rows indices within group with Dplyr/R 每次列值更改时,如何使用`dplyr`创建一个新组? - How can one use `dplyr` to create a new group each time a column value changes? R中某个组内的日期差异,并根据某些条件创建新列 - Date Difference within a group in R and creating a new column based on certain conditions 在分组的 dplyr 数据框中迭代应用函数以在 R 中创建列 - Applying a function iteratively in a grouped dplyr dataframe to create a column in R 在R sf中的分组列上创建新的几何 - Create new geometry on grouped column in R sf R 根据某个时间点的数据范围创建新列 - R create new column based on data range at a certain time point R:根据 dplyr 的列值打破 data.frame - R: Break a data.frame according to value of column with dplyr 有没有办法在按 ID 分组的单个日期时间列中找到时差? - Is there a way to find the time difference within a single datetime column, grouped by ID?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM