[英]Merging dataframes by ranges of datetimes in R
I have two dataframes.我有两个数据框。 The first dataframe
Observations
shows the date/times that a surveyor recorded events of interest, and a unique ID
number for each type of event:第一个数据框
Observations
显示测量员记录感兴趣事件的日期/时间,以及每种事件类型的唯一ID
号:
library(lubridate)
Observations <- data.frame(Time = dmy_hms(paste(c("13-7-2022 10:01:01","13-7-2022 14:02:01","15-7-2022 10:01:01", "15-7-2022 16:01:01"))), ID = c(1,3,1))
The second dataframe Sites
shows what date/time the surveyor started and stopped looking for events (ie, this describes the possible times that events could have been observed).第二个数据框
Sites
显示调查员开始和停止寻找事件的日期/时间(即,这描述了可能观察到事件的时间)。 TimeStart
is the time the surveyor began looking for an event, TimeEnd
is when they stopped. TimeStart
是调查员开始寻找事件的时间, TimeEnd
是他们停止的时间。 Sites
also contains the latitude and longitude where the surveyor was looking for events between TimeStart
and TimeEnd
. Sites
还包含测量员在TimeStart
和TimeEnd
之间寻找事件的纬度和经度。
Sites <- data.frame(TimeStart = dmy_hms(paste(c("13-7-2022 10:00:00","13-7-2022 14:00:00","15-7-2022 10:00:00", "15-7-2022 16:00:00"))),
TimeEnd = dmy_hms(paste(c("13-7-2022 10:05:00","13-7-2022 14:05:00","15-7-2022 10:05:00", "15-7-2022 16:05:00"))),
Latitude = c("11.1111", "11.2222", "11.1234", "11.1487"),
Longitude = c("99.1257", "99.3478", "99.6241", "99.6214"))
So the Time
that events are recorded by the surveyor (ie, recorded in Observations
) falls within one of the time ranges shown in Sites$TimeStart
and Sites$TimeEnd
.因此,测量员记录事件的
Time
(即记录在Observations
中)属于Sites$TimeStart
和Sites$TimeEnd
中显示的时间范围之一。
I would like to merge these two dataframes so that rows for each event ( ID
) recorded in Observations
contains the Latitude
and Longitude
where the surveyor was searching during the corresponding Time
, as well as when they started TimeStart
and stopped TimeEnd
searching for each period.我想合并这两个数据框,以便记录在
Observations
中的每个事件( ID
)的行包含测量员在相应Time
期间搜索的Latitude
和Longitude
,以及他们开始TimeStart
和停止TimeEnd
搜索每个时期的时间。
In the end, Observations
would look like this:最后,
Observations
看起来像这样:
Time ID Latitude Longitude TimeStart TimeEnd
2022-07-13 10:01:01 1 11.1111 99.1257 2022-07-13 10:00:00 2022-07-13 10:05:00
2022-07-13 14:02:01 3 11.2222 99.3478 2022-07-13 14:00:00 2022-07-13 14:05:00
2022-07-15 10:01:01 2 11.1234 99.6241 2022-07-15 10:00:00 2022-07-15 10:05:00
2022-07-15 16:01:01 1 11.1487 99.6214 2022-07-15 16:00:00 2022-07-15 16:05:00
How can we merge this data by times when Observations$Time
falls within a "range of times" shown in Sites$TimeStart
and Sites$TimeEnd
?当
Observations$Time
落在Sites$TimeStart
和Sites$TimeEnd
中显示的“时间范围”内时,我们如何按时间合并这些数据?
We can do this by retrieving the index (row number) in Sites
that fulfills the time condition:我们可以通过检索
Sites
中满足时间条件的索引(行号)来做到这一点:
Observations$siteindex <- sapply(Observations$Time, function(x) which(x<=Sites$TimeEnd&x>=Sites$TimeStart)[1]) # first matching row into Sites
Sites$siteindex <- 1:nrow(Sites)
result <- merge(Observations, Sites, by="siteindex")
siteindex Time ID TimeStart TimeEnd Latitude Longitude
1 1 2022-07-13 10:01:01 1 2022-07-13 10:00:00 2022-07-13 10:05:00 11.1111 99.1257
2 2 2022-07-13 14:02:01 3 2022-07-13 14:00:00 2022-07-13 14:05:00 11.2222 99.3478
3 3 2022-07-15 10:01:01 1 2022-07-15 10:00:00 2022-07-15 10:05:00 11.1234 99.6241
4 4 2022-07-15 16:01:01 2 2022-07-15 16:00:00 2022-07-15 16:05:00 11.1487 99.6214
The data might be a bit different because your data.frame
is incorrect: arguments imply differing number of rows: 4, 3
数据可能有点不同,因为您的
data.frame
不正确: arguments imply differing number of rows: 4, 3
:4、3
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.