简体   繁体   中英

How to mark the observations with given information

Considering the data collected with 5 minutes time interval with a numeric variable a ,and a discret variable acc , which represents if there's any incident happened( 0 for no incident while 1 for incident):

a<-c(1:(288*4))
t<-seq(as.POSIXct("2016-01-01 00:05:00"), as.POSIXct("2016-01-05 00:00:00"), by = '5 min')
acc<-rep(0,288*4)
df<-data.frame(t,a,acc)

Now I have another data set which has the time(accurates to 1 sec) at which the incidents happened during the collection period:

T<-sample(seq(as.POSIXct("2016-01-01 00:05:00"), as.POSIXct("2016-01-05 00:00:00"), by = '1 sec'),size = 5)

I want to mark the nearest 2 prior observation's acc as 1 according to the time in T . For example, if the incident happened at 2016-01-02 07:13:23 , the observations' acc with t of 2016-01-02 07:05:00 and 2016-01-02 07:10:00 are marked as 1

How could I manage to do this?

ind <- findInterval(T, df$t)
df$acc[c(ind, ind + 1)] <- 1

One way could be:

library(lubridate)
df$acc=apply(sapply(T,function(x) x %within% interval((df$t - minutes(4)-seconds(59)),(df$t + minutes(4)+seconds(59)))),1,sum)

lubridate allows for the easy manipulation of dates, minutes(x) and seconds(x) adds x minutes or second to a period object.
interval() is used to create a time interval confined by the time in df$t ± 4min59s.
sapply() is used to check if any of the time in T is within the interval.
apply() is used to collapse the results of sapply() (it outputs 1 column for each element in T)

If T contains a value that is exactly equal to one in df$t such as 2016-01-04 12:05:00 CET this will only put 1 for this one.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM