簡體   English   中英

R - 從不同的列添加日期到時間或如果時間跨過午夜添加額外的一天

[英]R - add date to time from different column or add extra day if the time goes across midnight

我在這里找不到任何東西,因此我的問題。 由於數據或丟失數據,我需要操作一列以將日期放入時間列中。

我有一個數據框,其中包含 EVENT_START_DTTM 的日期和時間,以及 EVENT_END_TM 的時間。 我的邏輯是,如果: - (EVENT_START_DTTM <= EVENT_END_TM 然后是同一天) - (EVENT_START_DTTM > EVENT_END_TM 然后是午夜,我需要在日期中添加一天)

EVENT_START_DTTM      EVENT_END_TM
2020-01-03 09:34:13   10:33:37
2020-01-03 07:57:24   23:04:38
2019-12-04 23:42:40   03:38:33
2019-12-04 22:33:33   00:07:13

另外,我想以分鍾為單位計算差異,期望的結果是:

EVENT_START_DTTM     EVENT_END_DTTM         Difference_min
2020-01-03 09:34:13  2020-01-03 10:33:37    59
2020-01-03 07:57:24  2020-01-03 23:04:38    907
2019-12-04 23:42:40  2019-12-05 03:38:33    237
2019-12-04 22:33:33  2019-12-05 00:07:13    95

下面是我的代碼

library(data.table)
library(lubridate)
EVENT_START_DTTM <- c("2020-01-03 09:34:13", "2020-01-03 07:57:24","2019-12-04 23:42:40", "2019-12-04 22:33:33")
EVENT_END_DTTM <- c("2020-01-03 10:33:3", "2020-01-03 23:04:38","2019-12-05 03:38:33", "2019-12-05 00:07:13")
df_dttm <- data.frame(as.POSIXct(EVENT_START_DTTM), as.POSIXct(EVENT_END_DTTM ))
setnames(df_dttm, c("EVENT_START_DTTM","EVENT_END_DTTM") )

您可以使用within()和數學。 要比較建議的小時數,請使用substr()

d <- within(d, {
  EVENT_START_DTTM=as.POSIXct(EVENT_START_DTTM)
  EVENT_END_TM=as.POSIXct(paste(substr(EVENT_START_DTTM, 1, 10), EVENT_END_TM)) + 
    (as.numeric(substr(d[, 1], 12, 13)) > as.numeric(substr(d[, 2], 1, 2)))^1*24*60*60
  Difference_min <-  EVENT_END_TM - EVENT_START_DTTM
})
d
#      EVENT_START_DTTM        EVENT_END_TM Difference_min
# 1 2020-01-03 09:34:13 2020-01-03 10:33:37  59.40000 mins
# 2 2020-01-03 07:57:24 2020-01-03 23:04:38 907.23333 mins
# 3 2019-12-04 23:42:40 2019-12-05 03:38:33 235.88333 mins
# 4 2019-12-04 22:33:33 2019-12-05 00:07:13  93.66667 mins

數據:

d <- structure(list(EVENT_START_DTTM = structure(4:1, .Label = c("2019-12-04 22:33:33", 
"2019-12-04 23:42:40", "2020-01-03 07:57:24", "2020-01-03 09:34:13"
), class = "factor"), EVENT_END_TM = structure(c(3L, 4L, 2L, 
1L), .Label = c("00:07:13", "03:38:33", "10:33:37", "23:04:38"
), class = "factor")), class = "data.frame", row.names = c(NA, 
-4L))

您可以通過在 dplyr 中使用 mutate 函數並格式化日期時間來做到這一點

library(data.table)
library(lubridate)
library(dplyr)

# Creating dataframe
EVENT_START_DTTM <- c("2020-01-03 09:34:13", "2020-01-03 07:57:24",
                      "2019-12-04 23:42:40", "2019-12-04 22:33:33")

EVENT_END_DTTM <- c("10:33:3", "23:04:38",
                    "03:38:33", "00:07:13")

df_dttm <- data.frame(as.POSIXct(EVENT_START_DTTM), EVENT_END_DTTM, stringsAsFactors = FALSE)
setnames(df_dttm, c("EVENT_START_DTTM","EVENT_END_DTTM") )

result <-
  df_dttm %>%
  mutate(start_date = ymd(as.Date(EVENT_START_DTTM)),
         start_time = format(strptime(EVENT_START_DTTM, "%Y-%m-%d %H:%M:%S"), "%H:%M:%S")) %>%
  rowwise() %>%
  mutate(end_date =
           if_else(difftime(as.POSIXct(start_time, format = "%H:%M:%S"),
                           as.POSIXct(EVENT_END_DTTM, format = "%H:%M:%S"), 
                           tz = "", units = "mins") < 0, 
                  start_date, start_date + 1)) %>%
  ungroup() %>%
  mutate(EVENT_END_DTTM = as.POSIXct(paste(end_date, EVENT_END_DTTM)),
         Difference_min = round(difftime(EVENT_END_DTTM, EVENT_START_DTTM, units="mins"), 0))

result

# # A tibble: 4 x 6
# EVENT_START_DTTM    EVENT_END_DTTM      start_date start_time end_date   Difference_min
# <dttm>              <dttm>              <date>     <chr>      <date>     <drtn>        
# 2020-01-03 09:34:13 2020-01-03 10:33:03 2020-01-03 09:34:13   2020-01-03  59 mins      
# 2020-01-03 07:57:24 2020-01-03 23:04:38 2020-01-03 07:57:24   2020-01-03 907 mins      
# 2019-12-04 23:42:40 2019-12-05 03:38:33 2019-12-04 23:42:40   2019-12-05 236 mins      
# 2019-12-04 22:33:33 2019-12-05 00:07:13 2019-12-04 22:33:33   2019-12-05  94 mins  

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM