[英]Merge two pandas data frames, one has infrequent dates and should be merged by the most recent date
a
, b
are pandas data frames, and a
updates less frequently than b
. a
, b
是 pandas 数据帧, a
更新频率低于b
。 Eg例如
a = pd.DataFrame({'id': np.array([1, 3, 4, 9]*2),
'date': np.repeat(['2021-01-03', '2021-02-06'], 4),
'score': np.linspace(0, 1, 8)})
a['date'] = pd.to_datetime(a['date'])
b = pd.DataFrame({'id': np.array([1, 3, 4, 9]*5),
'date': np.repeat(['2021-01-03', '2021-01-15', '2021-01-23', '2021-02-08', '2021-02-17'], 4),
'value': np.linspace(0, 1, 20)})
b['date'] = pd.to_datetime(b['date'])
I want to merge the two frames, by matching the ids and the date in b
with the most recent date in a
, so in this example I want the following pairings of the dates for the merge:我想通过将 b 中的 id 和日期与b
中a
最新日期匹配来合并两个帧,因此在此示例中,我需要以下日期配对以进行合并:
b -> a
2021-01-03 -> 2021-01-03
2021-01-15 -> 2021-01-03
2021-01-23 -> 2021-01-03
2021-02-08 -> 2021-02-06
2021-02-17 -> 2021-02-06
I can do this with a for-loop over each of the dates in a
, selecting the data in b
that lies between each pair of adjacent dates in a
, adding the score
from a
as a new column, and then concatenating these frames together, but is there a faster way to do this?我可以对 a 中的每个日期进行 for 循环,选择b
中位于a
中每对相邻日期之间的数据,将a
中的score
添加a
新列,然后将这些帧连接在一起,但是有没有更快的方法来做到这一点?
Use merge_asof
by on
and by
parameters:通过on
和by
参数使用merge_asof
:
df = pd.merge_asof(b, a, on='date', by='id')
For test was renamed column to date1
:对于 test 已将列重命名为date1
:
a = pd.DataFrame({'id': np.array([1, 3, 4, 9]*2),
'date': np.repeat(['2021-01-03', '2021-02-06'], 4),
'score': np.linspace(0, 1, 8)})
a['date'] = pd.to_datetime(a['date'])
b = pd.DataFrame({'id': np.array([1, 3, 4, 9]*5),
'date1': np.repeat(['2021-01-03', '2021-01-15', '2021-01-23', '2021-02-08', '2021-02-17'], 4),
'value': np.linspace(0, 1, 20)})
b['date1'] = pd.to_datetime(b['date1'])
df = pd.merge_asof(b, a, left_on='date1', right_on='date', by='id')
print (df)
id date1 value date score
0 1 2021-01-03 0.000000 2021-01-03 0.000000
1 3 2021-01-03 0.052632 2021-01-03 0.142857
2 4 2021-01-03 0.105263 2021-01-03 0.285714
3 9 2021-01-03 0.157895 2021-01-03 0.428571
4 1 2021-01-15 0.210526 2021-01-03 0.000000
5 3 2021-01-15 0.263158 2021-01-03 0.142857
6 4 2021-01-15 0.315789 2021-01-03 0.285714
7 9 2021-01-15 0.368421 2021-01-03 0.428571
8 1 2021-01-23 0.421053 2021-01-03 0.000000
9 3 2021-01-23 0.473684 2021-01-03 0.142857
10 4 2021-01-23 0.526316 2021-01-03 0.285714
11 9 2021-01-23 0.578947 2021-01-03 0.428571
12 1 2021-02-08 0.631579 2021-02-06 0.571429
13 3 2021-02-08 0.684211 2021-02-06 0.714286
14 4 2021-02-08 0.736842 2021-02-06 0.857143
15 9 2021-02-08 0.789474 2021-02-06 1.000000
16 1 2021-02-17 0.842105 2021-02-06 0.571429
17 3 2021-02-17 0.894737 2021-02-06 0.714286
18 4 2021-02-17 0.947368 2021-02-06 0.857143
19 9 2021-02-17 1.000000 2021-02-06 1.000000
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.