简体   繁体   English

如何根据最近(或最近)的时间戳合并两个数据帧

[英]How to merge two dataframes based on the closest (or most recent) timestamp

Suppose I have a dataframe df1, with columns 'A' and 'B'. 假设我有一个数据帧df1,列'A'和'B'。 A is a column of timestamps (eg unixtime) and 'B' is a column of some value. A是一列时间戳(例如unixtime),'B'是某个值的列。

Suppose I also have a dataframe df2 with columns 'C' and 'D'. 假设我还有一个带有'C'和'D'列的数据帧df2。 C is also a unixtime column and D is a column containing some other values. C也是一个unixtime列,D是包含其他值的列。

I would like to fuzzy merge the dataframes with a join on the timestamp . 我想模糊merge数据帧与timestamp上的连接。 However, if the timestamps don't match (which they most likely don't), I would like it to merge on the closest entry before the timestamp in 'A' that it can find in 'C'. 但是,如果时间戳不匹配(他们很可能没有),我希望它能合并到'A'中可以在'C'中找到的'A'中的时间戳之前的最近条目。

pd.merge does not support this, and I find myself converting away from dataframes using to_dict(), and using some iteration to solve this. pd.merge不支持这个,我发现自己使用to_dict()转换远离数据帧,并使用一些迭代来解决这个问题。 Is there a way in pandas to solve this? 大熊猫有办法解决这个问题吗?

numpy.searchsorted() finds the appropriate index positions to merge on (see docs) - hope the below get you closer to what you're looking for: numpy.searchsorted()找到要merge的相应index位置(请参阅文档) - 希望下面的内容能让您更接近您所寻找的内容:

start = datetime(2015, 12, 1)
df1 = pd.DataFrame({'A': [start + timedelta(minutes=randrange(60)) for i in range(10)], 'B': [1] * 10}).sort_values('A').reset_index(drop=True)
df2 = pd.DataFrame({'C': [start + timedelta(minutes=randrange(60)) for i in range(10)], 'D': [2] * 10}).sort_values('C').reset_index(drop=True)
df2.index = np.searchsorted(df1.A.values, df2.C.values)
print(pd.merge(left=df1, right=df2, left_index=True, right_index=True, how='left'))

                    A  B                   C   D
0 2015-12-01 00:01:00  1                 NaT NaN
1 2015-12-01 00:02:00  1 2015-12-01 00:02:00   2
2 2015-12-01 00:02:00  1                 NaT NaN
3 2015-12-01 00:12:00  1 2015-12-01 00:05:00   2
4 2015-12-01 00:16:00  1 2015-12-01 00:14:00   2
4 2015-12-01 00:16:00  1 2015-12-01 00:14:00   2
5 2015-12-01 00:28:00  1 2015-12-01 00:22:00   2
6 2015-12-01 00:30:00  1                 NaT NaN
7 2015-12-01 00:39:00  1 2015-12-01 00:31:00   2
7 2015-12-01 00:39:00  1 2015-12-01 00:39:00   2
8 2015-12-01 00:55:00  1 2015-12-01 00:40:00   2
8 2015-12-01 00:55:00  1 2015-12-01 00:46:00   2
8 2015-12-01 00:55:00  1 2015-12-01 00:54:00   2
9 2015-12-01 00:57:00  1                 NaT NaN

Building on @Stephan's answer and @JohnE's comment, something similar can be done with pandas.merge_asof for pandas>=0.19.0: 在@ Stephan的回答和@ JohnE的评论的基础上,对于pandas> = 0.19.0,可以使用pandas.merge_asof进行类似的操作:

>>> import numpy as np
>>> import pandas as pd
>>> from datetime import datetime, timedelta
>>> a_timestamps = pd.date_range(start, start + timedelta(hours=4.5), freq='30Min')
>>> c_timestamps = pd.date_range(start, start + timedelta(hours=9), freq='H')
>>> df1 = pd.DataFrame({'A': a_timestamps, 'B': range(10)})

                    A  B
0 2015-12-01 00:00:00  0
1 2015-12-01 00:30:00  1
2 2015-12-01 01:00:00  2
3 2015-12-01 01:30:00  3
4 2015-12-01 02:00:00  4
5 2015-12-01 02:30:00  5
6 2015-12-01 03:00:00  6
7 2015-12-01 03:30:00  7
8 2015-12-01 04:00:00  8
9 2015-12-01 04:30:00  9

>>> df2 = pd.DataFrame({'C': c_timestamps, 'D': range(10, 20)})

                   C   D
0 2015-12-01 00:00:00  10
1 2015-12-01 01:00:00  11
2 2015-12-01 02:00:00  12
3 2015-12-01 03:00:00  13
4 2015-12-01 04:00:00  14
5 2015-12-01 05:00:00  15
6 2015-12-01 06:00:00  16
7 2015-12-01 07:00:00  17
8 2015-12-01 08:00:00  18
9 2015-12-01 09:00:00  19

>>> pd.merge_asof(left=df1, right=df2, left_on='A', right_on='C')

                    A  B                   C   D
0 2015-12-01 00:00:00  0 2015-12-01 00:00:00  10
1 2015-12-01 00:30:00  1 2015-12-01 00:00:00  10
2 2015-12-01 01:00:00  2 2015-12-01 01:00:00  11
3 2015-12-01 01:30:00  3 2015-12-01 01:00:00  11
4 2015-12-01 02:00:00  4 2015-12-01 02:00:00  12
5 2015-12-01 02:30:00  5 2015-12-01 02:00:00  12
6 2015-12-01 03:00:00  6 2015-12-01 03:00:00  13
7 2015-12-01 03:30:00  7 2015-12-01 03:00:00  13
8 2015-12-01 04:00:00  8 2015-12-01 04:00:00  14
9 2015-12-01 04:30:00  9 2015-12-01 04:00:00  14

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM