简体   繁体   English

根据 R 中的一列中的完全匹配合并两个数据帧并在另一列中的错误内匹配

[英]Merge two dataframes based on an exact match in one column and match within an error in another column in R

I have a dataframe我有一个 dataframe

df1:

A  time B  C
a1 t1   b1 c1 
a2 t2   b2 c2
a3 t3   b3 c3

and another dataframe和另一个 dataframe

df2:

A  time D E 
a1 t4   d1 e1
a2 t5   d2 e2
a3 t6   d3 e3

Assume time is in the format yyyy-mm-dd hh:mm:ss eg 2019-08-16 15:06:38假设time格式为yyyy-mm-dd hh:mm:ss例如2019-08-16 15:06:38

and lets assume:并假设:

t4 - t1 = 40 seconds
t5 - t2 = -5 seconds
t6 - t3 = 120 seconds

I would like to merge these dataframes based on exact match on A and a match on column time with some acceptable difference between the dataframes, for example with in 1 min.我想根据A上的完全匹配和列time上的匹配来合并这些数据帧,数据帧之间存在一些可接受的差异,例如在 1 分钟内。

So my output would look like:所以我的 output 看起来像:

df3 :

A  B  C  D  E time(from df1)
a1 b1 c1 d1 e1 t1
a2 b2 c2 d2 e2 t2

See that a3 is not there because even though it matches on column A the difference in time exceeded the acceptable limit.看到a3不存在,因为即使它在A列上匹配,时间差也超过了可接受的限制。

How can I do this?我怎样才能做到这一点? If not for the "acceptable difference" part I would do like:如果不是“可接受的差异”部分,我会喜欢:

merge(df1, df2, by = c("A", "time"))

I've done similar joins using sqldf or foverlaps from data.table .我已经使用来自data.tablesqldffoverlaps完成了类似的连接。

Or a non-equi inner join from data.table :或者来自data.table的非等内连接:

library(data.table)
setDT(df1)[, c("start", "end") := .(time-60*60, time+60*60)]
df2[df1, on=.(A, time>=start, time<=end), nomatch=0L, .(A, B, C, D, E, time=i.time)]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM