[英]Summarizing observations within date and time boundaries in R
I have a dataframe called ActiveData
that shows the date and times ( StartDT
and EndDT
) when individual ID
s were active.我有一个名为ActiveData
的 dataframe ,它显示了单个ID
处于活动状态时的日期和时间( StartDT
和EndDT
)。 Active is structured like this where
StartDT and
EndDT` are formatted in mdy_hms: Active is structured like this where
StartDT and
EndDT` 在 mdy_hms 中格式化:
ID StartDT EndDT
1 05/05/2021 8:15:00 05/05/2021 9:15:00
2 05/05/2021 8:15:00 05/05/2021 9:15:00
3 05/05/2021 8:15:00 05/05/2021 10:15:00
…
I have another dataframe called Observations
that shows observations where each ID
observed themselves or another ID
satisfying some variable.我还有另一个名为Observations
的 dataframe 显示观察结果,其中每个ID
观察到自己或另一个ID
满足某个变量。 Here, ID
denotes the observer and IDobserved
denotes which ID
was observed satisfying the variable ( ID
s can also observe themselves).这里, ID
表示观察者, IDobserved
表示观察到的ID
满足变量( ID
s 也可以观察自己)。
ID DT IDobserved
1 05/05/2021 8:19:00 1
1 05/05/2021 8:20:00 1
1 05/05/2021 8:19:00 2
2 05/05/2021 8:19:20 1
2 05/05/2021 8:19:45 3
3 05/05/2021 8:19:00 1
3 05/05/2021 8:20:00 1
3 05/05/2021 8:25:00 1
3 05/05/2021 8:45:00 3
3 05/05/2021 8:19:00 2
…
I want to summarize the number of times that each ID
observed the other ID
s (including themselves) satisfying the variable within the time constraints specified by StartDT
and EndDT
in the ActiveData
dataframe, so that the final table would specify the number of observations, and the amount of time in seconds that passed between the boundaries that each ID
was actively observing (between StartDT
and EndDT
in ActiveData
).我想总结每个ID
在ActiveData
dataframe 中的StartDT
和EndDT
指定的时间限制内观察到其他ID
(包括他们自己)满足变量的次数,以便最终表将指定观察次数,以及每个ID
正在积极观察的边界(在 ActiveData 中的StartDT
和EndDT
之间)之间经过的时间量(以秒为ActiveData
)。 So for the data above, the final table would look like this:所以对于上面的数据,最终的表格应该是这样的:
ID IDobserved Observations TimeElapsed
1 1 2 3600
1 2 1 3600
2 1 1 3600
2 3 1 3600
3 1 3 7200
3 2 1 7200
3 3 1 7200
How can this be done?如何才能做到这一点?
Here is a method using data.table
.这是使用data.table
的方法。
data.frame
to data.table
with setDT
使用data.frame
将data.table
转换为setDT
mdy_hms
)将两个数据集中的“DT”、“StartDT”、“EndDT”列转换为日期时间 class ( mdy_hms
)difftime
and return the unique
rows与 ID 列上的第一个数据进行连接,通过使用difftime
获取 DT、StartDT、EndDT 列之间的差异之和来指定、汇总并返回unique
行library(data.table)
library(lubridate)
setDT(df1)[, c('StartDT', 'EndDT') := lapply(.SD, mdy_hms),
.SDcols = 2:3]
setDT(df2)[, DT := mdy_hms(DT)]
df2[, Observations := .N, .(ID, IDobserved)]
unique(df2[df1, .(IDobserved, Observations,
TimeElapsed = as.numeric(difftime(DT, StartDT, units = 'sec') +
difftime(EndDT, DT, units = 'sec'))), on = .(ID), by = .EACHI])
-output -输出
ID IDobserved Observations TimeElapsed
1: 1 1 2 3600
2: 1 2 1 3600
3: 2 1 1 3600
4: 2 3 1 3600
5: 3 1 3 7200
6: 3 3 1 7200
7: 3 2 1 7200
df1 <- structure(list(ID = 1:3, StartDT = c("05/05/2021 8:15:00",
"05/05/2021 8:15:00",
"05/05/2021 8:15:00"), EndDT = c("05/05/2021 9:15:00", "05/05/2021 9:15:00",
"05/05/2021 10:15:00")), class = "data.frame", row.names = c(NA,
-3L))
df2 <- structure(list(ID = c(1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L, 3L, 3L),
DT = c("05/05/2021 8:19:00", "05/05/2021 8:20:00", "05/05/2021 8:19:00",
"05/05/2021 8:19:20", "05/05/2021 8:19:45", "05/05/2021 8:19:00",
"05/05/2021 8:20:00", "05/05/2021 8:25:00", "05/05/2021 8:45:00",
"05/05/2021 8:19:00"), IDobserved = c(1L, 1L, 2L, 1L, 3L,
1L, 1L, 1L, 3L, 2L)), class = "data.frame", row.names = c(NA,
-10L))
Cool question!很酷的问题! With your data使用您的数据
ActiveData <- tibble::tribble(
~ID, ~StartDT, ~EndDT,
1, "05/05/2021 8:15:00", "05/05/2021 9:15:00",
2, "05/05/2021 8:15:00", "05/05/2021 9:15:00",
3, "05/05/2021 8:15:00", "05/05/2021 10:15:00"
)
Observations <- tibble::tribble(
~ID, ~DT, ~IDobserved,
1, "05/05/2021 8:19:00", 1,
1, "05/05/2021 8:20:00", 1,
1, "05/05/2021 8:19:00", 2,
2, "05/05/2021 8:19:20", 1,
2, "05/05/2021 8:19:45", 3,
3, "05/05/2021 8:19:00", 1,
3, "05/05/2021 8:20:00", 1,
3, "05/05/2021 8:25:00", 1,
3, "05/05/2021 8:45:00", 3,
3, "05/05/2021 8:19:00", 2
)
I would do我会做
library(dplyr)
fmt <- "%d/%m/%Y %H:%M:%S"
ActiveData %>%
mutate(across(-ID, ~ as.POSIXct(., format = fmt))) %>%
purrr::pmap(\(...) {
args <- list(...)
Observations %>%
mutate(DT = as.POSIXct(DT, format = fmt)) %>%
filter(DT >= args$StartDT, DT <= args$EndDT, ID == args$ID) %>%
count(ID, IDobserved, name = "Observations") %>%
mutate(TimeElapsed = difftime(args$EndDT,
args$StartDT,
units = "secs"))
}) %>%
bind_rows()
returning返回
# A tibble: 7 x 4
ID IDobserved Observations TimeElapsed
<dbl> <dbl> <int> <drtn>
1 1 1 2 3600 secs
2 1 2 1 3600 secs
3 2 1 1 3600 secs
4 2 3 1 3600 secs
5 3 1 3 7200 secs
6 3 2 1 7200 secs
7 3 3 1 7200 secs
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.