简体   繁体   English

Pandas merge_asof 不想在 pd.Timedelta 上合并,给出错误“必须与类型 int64 兼容”

[英]Pandas merge_asof does not want to merge on pd.Timedelta giving error "must be compat with type int64"

I am trying to merge the following files我正在尝试合并以下文件

df1 df1

unix_time,hk1,hk2,val2,hint
1560752700,10,15,3,6:25am
1560753900,20,25,5,6:45am
1560756600,10,10,-1,7:30am

df2 df2

unix_time,hk1,hk2,val,hint
1560751200,10,15,1,6am
1560754800,20,25,2,7am
1560758400,10,10,3,8am

on unix_timeunix_time

I am trying to do this as follows我正在尝试按如下方式执行此操作

merged = pd.merge_asof(df2.sort_values('unix_time'),
              df1.sort_values('unix_time'),
              by=['hk1', 'hk2'],
              on='unix_time',
              tolerance=pd.Timedelta(seconds=1800),
              direction='nearest')

From docs merge_asof tolerance can be specified as pd.Timedelta.从文档中,merge_asof 容差可以指定为 pd.Timedelta。 But when I am running the above piece of code I get但是当我运行上面的代码时,我得到了

pandas.errors.MergeError: incompatible tolerance <class 'pandas._libs.tslibs.timedeltas.Timedelta'>, must be compat with type int64

How do I fix it?我如何解决它?

Thank you谢谢

the expected join vals output for the above example:上面示例的预期 join vals 输出:

val | val2
1   | 3
2   | 5
3   | -1

Use tolerance=1800 :使用tolerance=1800

merged = pd.merge_asof(df2.sort_values('unix_time'),
              df1.sort_values('unix_time'),
              by=['hk1', 'hk2'],
              on='unix_time',
              tolerance=1800,
              direction='nearest')
print (merged)
    unix_time  hk1  hk2  val hint_x  val2  hint_y
0  1560751200   10   15    1    6am     3  6:25am
1  1560754800   20   25    2    7am     5  6:45am
2  1560758400   10   10    3    8am    -1  7:30am

Or convert both columns to datetimes before merge_asof if want use your solution:如果要使用您的解决方案,或者在merge_asof之前将两列转换为日期时间:

df1['unix_time'] = pd.to_datetime(df1['unix_time'], unit='s')
df2['unix_time'] = pd.to_datetime(df2['unix_time'], unit='s')

merged = pd.merge_asof(df2.sort_values('unix_time'),
              df1.sort_values('unix_time'),
              by=['hk1', 'hk2'],
              on='unix_time',
              tolerance=pd.Timedelta(seconds=1800),
              direction='nearest')

print (merged)
            unix_time  hk1  hk2  val hint_x  val2  hint_y
0 2019-06-17 06:00:00   10   15    1    6am     3  6:25am
1 2019-06-17 07:00:00   20   25    2    7am     5  6:45am
2 2019-06-17 08:00:00   10   10    3    8am    -1  7:30am

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM