简体   繁体   English

R中2个数据帧的时间值之间的条件合并

[英]Conditional merging between time values of 2 dataframe in R

I have 2 dataframes with different structures. 我有2个具有不同结构的数据框。 The first one contains data from a continuos and repeated analysis over few samples (multiple rows with time and value for each single measurement), the second one reports the sample ID and the start and finish time of the measurement. 第一个包含来自连续样本和对几个样本进行重复分析的数据(多行,每个单次测量都有时间和值),第二个报告样本ID以及测量的开始和结束时间。

##example
df.analysis <- data.frame(var= rnorm(321,mean=50),
                  time= seq(strptime("2018-1-1 0:0:0","%Y-%m-%d %H:%M:%S"), strptime("2018-1-1 8:0:0","%Y-%m-%d %H:%M:%S"), by= 90))

df.sample <- data.frame(sample= rep_len(1:8, 30),
                  start=seq(strptime("2018-1-1 0:0:0","%Y-%m-%d %H:%M:%S"), strptime("2018-1-1 7:45:0","%Y-%m-%d %H:%M:%S"),length.out=30),
                  end=seq(strptime("2018-1-1 0:15:0","%Y-%m-%d %H:%M:%S"), strptime("2018-1-1 8:0:0","%Y-%m-%d %H:%M:%S"),length.out=30))

I should insert the sample ID corresponding to each measured value, having in mind that not all the measurements corrispond to a sample. 我应该插入与每个测量值相对应的样品ID,请注意并非所有测量都与一个样品相对应。 I tried with the following code but it doesn't work because now it compares the rows from the first database with the corresponding rows from the second database. 我尝试使用以下代码,但是它不起作用,因为现在它将第一个数据库中的行与第二个数据库中的对应行进行比较。 While I need that every single row from the first database to be compared with all the rows from the second database 虽然我需要将第一个数据库中的每一行与第二个数据库中的所有行进行比较

if df.analysis$time >df.sample[,"start"] & df.analysis$time < df.sample[,"end"] {
  df.analysis$sample <-  df.sample$sample
  }

I thought to use a for loop or a lapply but I can't make work them properly. lapply想使用for循环或lapply但无法正常使用它们。

We can use a non-equi join 我们可以使用非等额联接

library(data.table)
setDT(df.analysis)[df.sample, sample := sample, on = .(time > start, time <end)]

One option using sqldf package can be achieved by having a inner join and then a left outer join as: 使用sqldf软件包的一个选项可以通过具有inner join sqldf然后具有left outer join sqldf方式来实现:

library(sqldf)

sqldf("select analysis.*, matchedSample.sample from
  'df.analysis' analysis 
  left outer join 
     (select sample.sample, analysis.time 
      from 'df.sample' sample,'df.analysis' analysis 
      where analysis.time > sample.start 
      and analysis.time < sample.end) matchedSample on
   analysis.time = matchedSample.time")

#          var                time sample
# 1   49.41763 2018-01-01 00:00:00     NA
# 2   50.20399 2018-01-01 00:01:30      1
# 3   48.80242 2018-01-01 00:03:00      1
# 4   50.56982 2018-01-01 00:04:30      1
# 5   50.08948 2018-01-01 00:06:00      1
# 6   50.32223 2018-01-01 00:07:30      1
# 7   49.60842 2018-01-01 00:09:00      1
# 8   50.82316 2018-01-01 00:10:30      1
# ....
# .... 313 more rows 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM