简体   繁体   English

在 R 中按日期时间范围合并数据帧

[英]Merging dataframes by ranges of datetimes in R

I have two dataframes.我有两个数据框。 The first dataframe Observations shows the date/times that a surveyor recorded events of interest, and a unique ID number for each type of event:第一个数据框Observations显示测量员记录感兴趣事件的日期/时间,以及每种事件类型的唯一ID号:

library(lubridate)
Observations <- data.frame(Time = dmy_hms(paste(c("13-7-2022 10:01:01","13-7-2022 14:02:01","15-7-2022 10:01:01", "15-7-2022 16:01:01"))), ID = c(1,3,1))

The second dataframe Sites shows what date/time the surveyor started and stopped looking for events (ie, this describes the possible times that events could have been observed).第二个数据框Sites显示调查员开始和停止寻找事件的日期/时间(即,这描述了可能观察到事件的时间)。 TimeStart is the time the surveyor began looking for an event, TimeEnd is when they stopped. TimeStart是调查员开始寻找事件的时间, TimeEnd是他们停止的时间。 Sites also contains the latitude and longitude where the surveyor was looking for events between TimeStart and TimeEnd . Sites还包含测量员在TimeStartTimeEnd之间寻找事件的纬度和经度。

Sites <- data.frame(TimeStart = dmy_hms(paste(c("13-7-2022 10:00:00","13-7-2022 14:00:00","15-7-2022 10:00:00", "15-7-2022 16:00:00"))),
                    TimeEnd = dmy_hms(paste(c("13-7-2022 10:05:00","13-7-2022 14:05:00","15-7-2022 10:05:00", "15-7-2022 16:05:00"))),
                    Latitude = c("11.1111", "11.2222", "11.1234", "11.1487"),
                    Longitude = c("99.1257", "99.3478", "99.6241", "99.6214"))

So the Time that events are recorded by the surveyor (ie, recorded in Observations ) falls within one of the time ranges shown in Sites$TimeStart and Sites$TimeEnd .因此,测量员记录事件的Time (即记录在Observations中)属于Sites$TimeStartSites$TimeEnd中显示的时间范围之一。

I would like to merge these two dataframes so that rows for each event ( ID ) recorded in Observations contains the Latitude and Longitude where the surveyor was searching during the corresponding Time , as well as when they started TimeStart and stopped TimeEnd searching for each period.我想合并这两个数据框,以便记录在Observations中的每个事件( ID )的行包含测量员在相应Time期间搜索的LatitudeLongitude ,以及他们开始TimeStart和停止TimeEnd搜索每个时期的时间。

In the end, Observations would look like this:最后, Observations看起来像这样:

Time                ID  Latitude Longitude  TimeStart              TimeEnd    
2022-07-13 10:01:01 1   11.1111   99.1257   2022-07-13 10:00:00   2022-07-13 10:05:00
2022-07-13 14:02:01 3   11.2222   99.3478   2022-07-13 14:00:00   2022-07-13 14:05:00
2022-07-15 10:01:01 2   11.1234   99.6241   2022-07-15 10:00:00   2022-07-15 10:05:00
2022-07-15 16:01:01 1   11.1487   99.6214   2022-07-15 16:00:00   2022-07-15 16:05:00

How can we merge this data by times when Observations$Time falls within a "range of times" shown in Sites$TimeStart and Sites$TimeEnd ?Observations$Time落在Sites$TimeStartSites$TimeEnd中显示的“时间范围”内时,我们如何按时间合并这些数据?

We can do this by retrieving the index (row number) in Sites that fulfills the time condition:我们可以通过检索Sites中满足时间条件的索引(行号)来做到这一点:

Observations$siteindex <- sapply(Observations$Time, function(x) which(x<=Sites$TimeEnd&x>=Sites$TimeStart)[1]) # first matching row into Sites
Sites$siteindex <- 1:nrow(Sites)
result <- merge(Observations, Sites, by="siteindex")
  siteindex                Time ID           TimeStart             TimeEnd Latitude Longitude
1         1 2022-07-13 10:01:01  1 2022-07-13 10:00:00 2022-07-13 10:05:00  11.1111   99.1257
2         2 2022-07-13 14:02:01  3 2022-07-13 14:00:00 2022-07-13 14:05:00  11.2222   99.3478
3         3 2022-07-15 10:01:01  1 2022-07-15 10:00:00 2022-07-15 10:05:00  11.1234   99.6241
4         4 2022-07-15 16:01:01  2 2022-07-15 16:00:00 2022-07-15 16:05:00  11.1487   99.6214

The data might be a bit different because your data.frame is incorrect: arguments imply differing number of rows: 4, 3数据可能有点不同,因为您的data.frame不正确: arguments imply differing number of rows: 4, 3 :4、3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM