簡體   English   中英

R-Studio 數據框獲取每個 ID 和每個狀態的開始和結束日期時間

[英]R-Studio pivot data frame to get start and end datetimes per ID and per Status

我有一個 df,我想旋轉它以獲得每個 ID 和狀態 14 項的開始和結束日期時間。

這聽起來像是一個挑戰,我一直試圖在沒有運氣的情況下解決它。

所以,我的數據框是這樣的:

id  changes_dttm        old_status_cd   new_status_cd
1   29/01/2020 08:45    13              14
2   29/01/2020 09:39    13              14
2   29/01/2020 06:24    14              13
2   28/01/2020 20:11    13              14
2   26/01/2020 17:34    14              13
2   26/01/2020 16:12    13              14
2   26/01/2020 09:42    12              13
3   26/01/2020 13:58    13              14
3   26/01/2020 09:47    14              13
3   25/01/2020 13:43    -3              14
3   25/01/2020 06:01    12              -3
4   23/01/2020 05:54    -2              20
4   22/01/2020 10:24    14              -2
4   21/01/2020 11:44    13              14

預期結果:

id  changes_dttm        old_status_cd   new_status_cd       14 Start            14 end
1   29/01/2020 08:45    13              14                  29/01/2020 08:45    
2   29/01/2020 09:39    13              14                  29/01/2020 09:39    
2   28/01/2020 20:11    13              14                  28/01/2020 20:11    29/01/2020 06:24
2   26/01/2020 16:12    13              14                  26/01/2020 16:12    26/01/2020 17:34
3   26/01/2020 13:58    13              14                  26/01/2020 13:58    
3   25/01/2020 13:43    -3              14                  25/01/2020 13:43    26/01/2020 09:47
4   21/01/2020 11:44    13              14                  21/01/2020 11:44    22/01/2020 10:24

空白值是每個 ID 的結束日期時間,所以它是正確的 :)

我的代碼高於子集:

library(data.table)
library(lubridate)

    id <- c(1,2,2,2,2,2,2,3,3,3,3,4,4,4)
    changes_dttm <- c("29/01/2020 08:45","29/01/2020 09:39", "29/01/2020 06:24","28/01/2020 20:11","26/01/2020 17:34","26/01/2020 16:12","26/01/2020 09:42","26/01/2020 13:58","26/01/2020 09:47","25/01/2020 13:43","25/01/2020 06:01","23/01/2020 05:54","22/01/2020 10:24","21/01/2020 11:44")
    old_status_cd <- c(13,13,14,13,14,13,12,13,14,-3,12,-2,14,13)
    new_status_cd <- c(14,14,13,14,13,14,13,14,13,14,-3,20,-2,14)

    df <- data.frame(id,as.POSIXct(df$changes_dttm, format="%d/%m/%Y %H:%M") , old_status_cd, new_status_cd)
    colnames(df)[2] <- "changes_dttm"

dplyr解決方案:

library(dplyr)
library(lubridate)

df %>%
    arrange(id, changes_dttm) %>%
    group_by(id) %>%
    mutate(Start = changes_dttm
           , End = lead(changes_dttm)) %>%
    filter(new_status_cd == 14) %>%
    arrange(id, desc(changes_dttm))

     id changes_dttm        old_status_cd new_status_cd Start               End                
  <dbl> <dttm>                      <dbl>         <dbl> <dttm>              <dttm>             
1     1 2020-01-29 08:45:00            13            14 2020-01-29 08:45:00 NA                 
2     2 2020-01-29 09:39:00            13            14 2020-01-29 09:39:00 NA                 
3     2 2020-01-28 20:11:00            13            14 2020-01-28 20:11:00 2020-01-29 06:24:00
4     2 2020-01-26 16:12:00            13            14 2020-01-26 16:12:00 2020-01-26 17:34:00
5     3 2020-01-26 13:58:00            13            14 2020-01-26 13:58:00 NA                 
6     3 2020-01-25 13:43:00            -3            14 2020-01-25 13:43:00 2020-01-26 09:47:00
7     4 2020-01-21 11:44:00            13            14 2020-01-21 11:44:00 2020-01-22 10:24:00

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM