簡體   English   中英

根據條件計算持續時間,同時分組但不聚合

[英]Calculate duration based on conditons, while grouping but not aggregating

客觀的:

我有一個數據集 df,我想按 ID 對其進行分組,並根據某些條件找到持續時間:Focus == True、Read == True 和 ID != ""。 但是,我不想聚合 ID,因為我希望將它們放在自己單獨的“塊”中

ID            Date                   Focus        Read


A             1/2/2020 5:00:00 AM    True         True
A             1/2/2020 5:00:05 AM    True         True
              1/3/2020 6:00:00 AM    True
              1/3/2020 6:00:05 AM    True         
B             1/4/2020 7:00:00 AM    True         True
B             1/4/2020 7:00:02 AM    True         True
B             1/4/2020 7:00:10 AM    True         True
A             1/2/2020 7:30:00 AM    True         True
A             1/2/2020 7:30:20 AM    True         True

我想要這個輸出:

ID                          Duration               Date

A                           5 sec                  1/2/2020
B                           10 sec                 1/4/2020
A                           20 sec                 1/2/2020

輸入:

structure(list(ID = structure(c(2L, 2L, 1L, 1L, 3L, 3L, 3L, 2L, 
2L), .Label = c("", "A", "B"), class = "factor"), Date = structure(c(1L, 
2L, 5L, 6L, 7L, 8L, 9L, 3L, 4L), .Label = c("1/2/2020 5:00:00 AM", 
"1/2/2020 5:00:05 AM", "1/2/2020 7:30:00 AM", "1/2/2020 7:30:20 AM", 
"1/3/2020 6:00:00 AM", "1/3/2020 6:00:05 AM", "1/4/2020 7:00:00 AM", 
"1/4/2020 7:00:02 AM", "1/4/2020 7:00:10 AM"), class = "factor"), 
Focus = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "True ", class = "factor"), 
Read = structure(c(2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L), .Label = c("", 
"True "), class = "factor")), class = "data.frame", row.names = c(NA, 
-9L))

這很好用,但不是聚合 ID,我如何將它們分開:

 library(dplyr)
 library(lubridate)
 df %>% 
 filter(as.logical(trimws(Read)), as.logical(trimws(Focus))) %>%
 mutate(Date = mdy_hms(Date)) %>%
 group_by(ID) %>% 
 summarise(Duration = difftime(last(Date), first(Date), units = "secs"))

任何建議表示贊賞。

我們可以為 'ID' 中相鄰的非相等元素創建具有 run-length-encoding-id rleid的組,然后在轉換為DateTime后在 'Date' 上應用difftime

library(dplyr)
library(lubridate)
library(data.table)
df %>% 
 filter(as.logical(trimws(Read)), as.logical(trimws(Focus))) %>%
 mutate(Date = mdy_hms(Date)) %>%
 group_by(grp = rleid(ID), ID) %>%   
 summarise(Duration = difftime(last(Date), first(Date), units = "secs"),
         Date = as.Date(first(Date))) %>%
 ungroup %>%
 select(-grp)
# A tibble: 3 x 3
#  ID    Duration Date      
#  <fct> <drtn>   <date>    
#1 A      5 secs  2020-01-02
#2 B     10 secs  2020-01-04
#3 A     20 secs  2020-01-02

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM