[英]Conditional merging between time values of 2 dataframe in R
I have 2 dataframes with different structures. 我有2个具有不同结构的数据框。 The first one contains data from a continuos and repeated analysis over few samples (multiple rows with time and value for each single measurement), the second one reports the sample ID and the start and finish time of the measurement.
第一个包含来自连续样本和对几个样本进行重复分析的数据(多行,每个单次测量都有时间和值),第二个报告样本ID以及测量的开始和结束时间。
##example
df.analysis <- data.frame(var= rnorm(321,mean=50),
time= seq(strptime("2018-1-1 0:0:0","%Y-%m-%d %H:%M:%S"), strptime("2018-1-1 8:0:0","%Y-%m-%d %H:%M:%S"), by= 90))
df.sample <- data.frame(sample= rep_len(1:8, 30),
start=seq(strptime("2018-1-1 0:0:0","%Y-%m-%d %H:%M:%S"), strptime("2018-1-1 7:45:0","%Y-%m-%d %H:%M:%S"),length.out=30),
end=seq(strptime("2018-1-1 0:15:0","%Y-%m-%d %H:%M:%S"), strptime("2018-1-1 8:0:0","%Y-%m-%d %H:%M:%S"),length.out=30))
I should insert the sample ID corresponding to each measured value, having in mind that not all the measurements corrispond to a sample. 我应该插入与每个测量值相对应的样品ID,请注意并非所有测量都与一个样品相对应。 I tried with the following code but it doesn't work because now it compares the rows from the first database with the corresponding rows from the second database.
我尝试使用以下代码,但是它不起作用,因为现在它将第一个数据库中的行与第二个数据库中的对应行进行比较。 While I need that every single row from the first database to be compared with all the rows from the second database
虽然我需要将第一个数据库中的每一行与第二个数据库中的所有行进行比较
if df.analysis$time >df.sample[,"start"] & df.analysis$time < df.sample[,"end"] {
df.analysis$sample <- df.sample$sample
}
I thought to use a for
loop or a lapply
but I can't make work them properly. 我
lapply
想使用for
循环或lapply
但无法正常使用它们。
We can use a non-equi join 我们可以使用非等额联接
library(data.table)
setDT(df.analysis)[df.sample, sample := sample, on = .(time > start, time <end)]
One option using sqldf
package can be achieved by having a inner join
and then a left outer join
as: 使用
sqldf
软件包的一个选项可以通过具有inner join
sqldf
然后具有left outer join
sqldf
方式来实现:
library(sqldf)
sqldf("select analysis.*, matchedSample.sample from
'df.analysis' analysis
left outer join
(select sample.sample, analysis.time
from 'df.sample' sample,'df.analysis' analysis
where analysis.time > sample.start
and analysis.time < sample.end) matchedSample on
analysis.time = matchedSample.time")
# var time sample
# 1 49.41763 2018-01-01 00:00:00 NA
# 2 50.20399 2018-01-01 00:01:30 1
# 3 48.80242 2018-01-01 00:03:00 1
# 4 50.56982 2018-01-01 00:04:30 1
# 5 50.08948 2018-01-01 00:06:00 1
# 6 50.32223 2018-01-01 00:07:30 1
# 7 49.60842 2018-01-01 00:09:00 1
# 8 50.82316 2018-01-01 00:10:30 1
# ....
# .... 313 more rows
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.