[英]Merge two dataframes based on an exact match in one column and match within an error in another column in R
I have a dataframe我有一个 dataframe
df1:
A time B C
a1 t1 b1 c1
a2 t2 b2 c2
a3 t3 b3 c3
and another dataframe和另一个 dataframe
df2:
A time D E
a1 t4 d1 e1
a2 t5 d2 e2
a3 t6 d3 e3
Assume time
is in the format yyyy-mm-dd hh:mm:ss
eg 2019-08-16 15:06:38
假设time
格式为yyyy-mm-dd hh:mm:ss
例如2019-08-16 15:06:38
and lets assume:并假设:
t4 - t1 = 40 seconds
t5 - t2 = -5 seconds
t6 - t3 = 120 seconds
I would like to merge these dataframes based on exact match on A
and a match on column time
with some acceptable difference between the dataframes, for example with in 1 min.我想根据A
上的完全匹配和列time
上的匹配来合并这些数据帧,数据帧之间存在一些可接受的差异,例如在 1 分钟内。
So my output would look like:所以我的 output 看起来像:
df3 :
A B C D E time(from df1)
a1 b1 c1 d1 e1 t1
a2 b2 c2 d2 e2 t2
See that a3
is not there because even though it matches on column A
the difference in time exceeded the acceptable limit.看到a3
不存在,因为即使它在A
列上匹配,时间差也超过了可接受的限制。
How can I do this?我怎样才能做到这一点? If not for the "acceptable difference" part I would do like:如果不是“可接受的差异”部分,我会喜欢:
merge(df1, df2, by = c("A", "time"))
I've done similar joins using sqldf
or foverlaps
from data.table
.我已经使用来自data.table
的sqldf
或foverlaps
完成了类似的连接。
Or a non-equi inner join from data.table
:或者来自data.table
的非等内连接:
library(data.table)
setDT(df1)[, c("start", "end") := .(time-60*60, time+60*60)]
df2[df1, on=.(A, time>=start, time<=end), nomatch=0L, .(A, B, C, D, E, time=i.time)]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.