简体   繁体   中英

Conditional merging between time values of 2 dataframe in R

I have 2 dataframes with different structures. The first one contains data from a continuos and repeated analysis over few samples (multiple rows with time and value for each single measurement), the second one reports the sample ID and the start and finish time of the measurement.

##example
df.analysis <- data.frame(var= rnorm(321,mean=50),
                  time= seq(strptime("2018-1-1 0:0:0","%Y-%m-%d %H:%M:%S"), strptime("2018-1-1 8:0:0","%Y-%m-%d %H:%M:%S"), by= 90))

df.sample <- data.frame(sample= rep_len(1:8, 30),
                  start=seq(strptime("2018-1-1 0:0:0","%Y-%m-%d %H:%M:%S"), strptime("2018-1-1 7:45:0","%Y-%m-%d %H:%M:%S"),length.out=30),
                  end=seq(strptime("2018-1-1 0:15:0","%Y-%m-%d %H:%M:%S"), strptime("2018-1-1 8:0:0","%Y-%m-%d %H:%M:%S"),length.out=30))

I should insert the sample ID corresponding to each measured value, having in mind that not all the measurements corrispond to a sample. I tried with the following code but it doesn't work because now it compares the rows from the first database with the corresponding rows from the second database. While I need that every single row from the first database to be compared with all the rows from the second database

if df.analysis$time >df.sample[,"start"] & df.analysis$time < df.sample[,"end"] {
  df.analysis$sample <-  df.sample$sample
  }

I thought to use a for loop or a lapply but I can't make work them properly.

We can use a non-equi join

library(data.table)
setDT(df.analysis)[df.sample, sample := sample, on = .(time > start, time <end)]

One option using sqldf package can be achieved by having a inner join and then a left outer join as:

library(sqldf)

sqldf("select analysis.*, matchedSample.sample from
  'df.analysis' analysis 
  left outer join 
     (select sample.sample, analysis.time 
      from 'df.sample' sample,'df.analysis' analysis 
      where analysis.time > sample.start 
      and analysis.time < sample.end) matchedSample on
   analysis.time = matchedSample.time")

#          var                time sample
# 1   49.41763 2018-01-01 00:00:00     NA
# 2   50.20399 2018-01-01 00:01:30      1
# 3   48.80242 2018-01-01 00:03:00      1
# 4   50.56982 2018-01-01 00:04:30      1
# 5   50.08948 2018-01-01 00:06:00      1
# 6   50.32223 2018-01-01 00:07:30      1
# 7   49.60842 2018-01-01 00:09:00      1
# 8   50.82316 2018-01-01 00:10:30      1
# ....
# .... 313 more rows 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM