简体   繁体   English

在 R 中总结日期和时间范围内的观察

[英]Summarizing observations within date and time boundaries in R

I have a dataframe called ActiveData that shows the date and times ( StartDT and EndDT ) when individual ID s were active.我有一个名为ActiveData的 dataframe ,它显示了单个ID处于活动状态时的日期和时间( StartDTEndDT )。 Active is structured like this where StartDT and EndDT` are formatted in mdy_hms: Active is structured like this where StartDT and EndDT` 在 mdy_hms 中格式化:

ID         StartDT               EndDT
1  05/05/2021 8:15:00    05/05/2021 9:15:00
2  05/05/2021 8:15:00    05/05/2021 9:15:00
3  05/05/2021 8:15:00    05/05/2021 10:15:00
…

I have another dataframe called Observations that shows observations where each ID observed themselves or another ID satisfying some variable.我还有另一个名为Observations的 dataframe 显示观察结果,其中每个ID观察到自己或另一个ID满足某个变量。 Here, ID denotes the observer and IDobserved denotes which ID was observed satisfying the variable ( ID s can also observe themselves).这里, ID表示观察者, IDobserved表示观察到的ID满足变量( ID s 也可以观察自己)。

ID       DT                          IDobserved
1      05/05/2021 8:19:00                1
1      05/05/2021 8:20:00                1      
1      05/05/2021 8:19:00                2
2      05/05/2021 8:19:20                1
2      05/05/2021 8:19:45                3
3      05/05/2021 8:19:00                1
3      05/05/2021 8:20:00                1
3      05/05/2021 8:25:00                1
3      05/05/2021 8:45:00                3
3      05/05/2021 8:19:00                2
…

I want to summarize the number of times that each ID observed the other ID s (including themselves) satisfying the variable within the time constraints specified by StartDT and EndDT in the ActiveData dataframe, so that the final table would specify the number of observations, and the amount of time in seconds that passed between the boundaries that each ID was actively observing (between StartDT and EndDT in ActiveData ).我想总结每个IDActiveData dataframe 中的StartDTEndDT指定的时间限制内观察到其他ID (包括他们自己)满足变量的次数,以便最终表将指定观察次数,以及每个ID正在积极观察的边界(在 ActiveData 中的StartDTEndDT之间)之间经过的时间量(以秒为ActiveData )。 So for the data above, the final table would look like this:所以对于上面的数据,最终的表格应该是这样的:

ID   IDobserved   Observations      TimeElapsed
1         1            2             3600
1         2            1             3600
2         1            1             3600
2         3            1             3600
3         1            3             7200
3         2            1             7200
3         3            1             7200

How can this be done?如何才能做到这一点?

Here is a method using data.table .这是使用data.table的方法。

  1. Convert data.frame to data.table with setDT使用data.framedata.table转换为setDT
  2. Convert the 'DT', 'StartDT', 'EndDT' columns in both dataset to datetime class ( mdy_hms )将两个数据集中的“DT”、“StartDT”、“EndDT”列转换为日期时间 class ( mdy_hms )
  3. Create the 'Observations' column in second dataset (df2) as the number of observations per group ID, IDobserved在第二个数据集 (df2) 中创建“观察”列作为每个组 ID 的观察数,IDobserved
  4. Do a join with the first data on the ID column, specify, summarise by getting the sum of difference between the DT, StartDT, EndDT columns using difftime and return the unique rows与 ID 列上的第一个数据进行连接,通过使用difftime获取 DT、StartDT、EndDT 列之间的差异之和来指定、汇总并返回unique
library(data.table)
library(lubridate)
setDT(df1)[, c('StartDT', 'EndDT') := lapply(.SD, mdy_hms),
       .SDcols = 2:3]
setDT(df2)[, DT := mdy_hms(DT)]
df2[, Observations := .N, .(ID, IDobserved)]
unique(df2[df1, .(IDobserved, Observations,
      TimeElapsed = as.numeric(difftime(DT, StartDT, units = 'sec') +
        difftime(EndDT, DT, units = 'sec'))), on = .(ID), by = .EACHI])

-output -输出

    ID IDobserved Observations TimeElapsed
1:  1          1            2        3600
2:  1          2            1        3600
3:  2          1            1        3600
4:  2          3            1        3600
5:  3          1            3        7200
6:  3          3            1        7200
7:  3          2            1        7200

data数据


df1 <- structure(list(ID = 1:3, StartDT = c("05/05/2021 8:15:00", 
"05/05/2021 8:15:00", 
"05/05/2021 8:15:00"), EndDT = c("05/05/2021 9:15:00", "05/05/2021 9:15:00", 
"05/05/2021 10:15:00")), class = "data.frame", row.names = c(NA, 
-3L))

df2 <- structure(list(ID = c(1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L, 3L, 3L), 
    DT = c("05/05/2021 8:19:00", "05/05/2021 8:20:00", "05/05/2021 8:19:00", 
    "05/05/2021 8:19:20", "05/05/2021 8:19:45", "05/05/2021 8:19:00", 
    "05/05/2021 8:20:00", "05/05/2021 8:25:00", "05/05/2021 8:45:00", 
    "05/05/2021 8:19:00"), IDobserved = c(1L, 1L, 2L, 1L, 3L, 
    1L, 1L, 1L, 3L, 2L)), class = "data.frame", row.names = c(NA, 
-10L))

Cool question!很酷的问题! With your data使用您的数据

ActiveData <- tibble::tribble(
  ~ID, ~StartDT,             ~EndDT,
  1,   "05/05/2021 8:15:00", "05/05/2021 9:15:00",
  2,   "05/05/2021 8:15:00", "05/05/2021 9:15:00",
  3,   "05/05/2021 8:15:00", "05/05/2021 10:15:00"
)

Observations <- tibble::tribble(
  ~ID, ~DT,                ~IDobserved,
  1,   "05/05/2021 8:19:00", 1,
  1,   "05/05/2021 8:20:00", 1,
  1,   "05/05/2021 8:19:00", 2,
  2,   "05/05/2021 8:19:20", 1,
  2,   "05/05/2021 8:19:45", 3,
  3,   "05/05/2021 8:19:00", 1,
  3,   "05/05/2021 8:20:00", 1,
  3,   "05/05/2021 8:25:00", 1,
  3,   "05/05/2021 8:45:00", 3,
  3,   "05/05/2021 8:19:00", 2
)

I would do我会做

library(dplyr)

fmt <- "%d/%m/%Y %H:%M:%S"

ActiveData %>%
  mutate(across(-ID, ~ as.POSIXct(., format = fmt))) %>%
  purrr::pmap(\(...) {
    args <- list(...)
    Observations %>%
      mutate(DT = as.POSIXct(DT, format = fmt)) %>%
      filter(DT >= args$StartDT, DT <= args$EndDT, ID == args$ID) %>%
      count(ID, IDobserved, name = "Observations") %>%
      mutate(TimeElapsed = difftime(args$EndDT,
                                    args$StartDT,
                                    units =  "secs"))
  }) %>%
  bind_rows()

returning返回

# A tibble: 7 x 4
     ID IDobserved Observations TimeElapsed
  <dbl>      <dbl>        <int> <drtn>
1     1          1            2 3600 secs
2     1          2            1 3600 secs
3     2          1            1 3600 secs
4     2          3            1 3600 secs
5     3          1            3 7200 secs
6     3          2            1 7200 secs
7     3          3            1 7200 secs

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM