合并两个 pandas 数据帧，一个日期不频繁，应按最近日期合并

Question

a , b are pandas data frames, and a updates less frequently than b . a , b是 pandas 数据帧， a更新频率低于b 。 Eg例如

a = pd.DataFrame({'id': np.array([1, 3, 4, 9]*2),
                  'date': np.repeat(['2021-01-03', '2021-02-06'], 4),
                  'score': np.linspace(0, 1, 8)})
a['date'] = pd.to_datetime(a['date'])

b = pd.DataFrame({'id': np.array([1, 3, 4, 9]*5),
                  'date': np.repeat(['2021-01-03', '2021-01-15', '2021-01-23', '2021-02-08', '2021-02-17'], 4),
                  'value': np.linspace(0, 1, 20)})
b['date'] = pd.to_datetime(b['date'])

I want to merge the two frames, by matching the ids and the date in b with the most recent date in a , so in this example I want the following pairings of the dates for the merge:我想通过将 b 中的 id 和日期与b中a最新日期匹配来合并两个帧，因此在此示例中，我需要以下日期配对以进行合并：

b          -> a
2021-01-03 -> 2021-01-03
2021-01-15 -> 2021-01-03
2021-01-23 -> 2021-01-03
2021-02-08 -> 2021-02-06
2021-02-17 -> 2021-02-06

I can do this with a for-loop over each of the dates in a , selecting the data in b that lies between each pair of adjacent dates in a , adding the score from a as a new column, and then concatenating these frames together, but is there a faster way to do this?我可以对 a 中的每个日期进行 for 循环，选择b中位于a中每对相邻日期之间的数据，将a中的score添加a新列，然后将这些帧连接在一起，但是有没有更快的方法来做到这一点？

Answer 1

Use merge_asof by on and by parameters:通过on和by参数使用merge_asof ：

df = pd.merge_asof(b, a, on='date', by='id')

For test was renamed column to date1 :对于 test 已将列重命名为date1 ：

a = pd.DataFrame({'id': np.array([1, 3, 4, 9]*2),
                  'date': np.repeat(['2021-01-03', '2021-02-06'], 4),
                  'score': np.linspace(0, 1, 8)})
a['date'] = pd.to_datetime(a['date'])

b = pd.DataFrame({'id': np.array([1, 3, 4, 9]*5),
                  'date1': np.repeat(['2021-01-03', '2021-01-15', '2021-01-23', '2021-02-08', '2021-02-17'], 4),
                  'value': np.linspace(0, 1, 20)})
b['date1'] = pd.to_datetime(b['date1'])

df = pd.merge_asof(b, a, left_on='date1', right_on='date', by='id')
print (df)
    id      date1     value       date     score
0    1 2021-01-03  0.000000 2021-01-03  0.000000
1    3 2021-01-03  0.052632 2021-01-03  0.142857
2    4 2021-01-03  0.105263 2021-01-03  0.285714
3    9 2021-01-03  0.157895 2021-01-03  0.428571
4    1 2021-01-15  0.210526 2021-01-03  0.000000
5    3 2021-01-15  0.263158 2021-01-03  0.142857
6    4 2021-01-15  0.315789 2021-01-03  0.285714
7    9 2021-01-15  0.368421 2021-01-03  0.428571
8    1 2021-01-23  0.421053 2021-01-03  0.000000
9    3 2021-01-23  0.473684 2021-01-03  0.142857
10   4 2021-01-23  0.526316 2021-01-03  0.285714
11   9 2021-01-23  0.578947 2021-01-03  0.428571
12   1 2021-02-08  0.631579 2021-02-06  0.571429
13   3 2021-02-08  0.684211 2021-02-06  0.714286
14   4 2021-02-08  0.736842 2021-02-06  0.857143
15   9 2021-02-08  0.789474 2021-02-06  1.000000
16   1 2021-02-17  0.842105 2021-02-06  0.571429
17   3 2021-02-17  0.894737 2021-02-06  0.714286
18   4 2021-02-17  0.947368 2021-02-06  0.857143
19   9 2021-02-17  1.000000 2021-02-06  1.000000

合并两个 pandas 数据帧，一个日期不频繁，应按最近日期合并

问题描述

1 个解决方案

解决方案1
0 已采纳 2021-02-19 12:45:41

合并两个 pandas 数据帧，一个日期不频繁，应按最近日期合并

问题描述

1 个解决方案

解决方案1 0 已采纳 2021-02-19 12:45:41

解决方案1
0 已采纳 2021-02-19 12:45:41