a
, b
are pandas data frames, and a
updates less frequently than b
. Eg
a = pd.DataFrame({'id': np.array([1, 3, 4, 9]*2),
'date': np.repeat(['2021-01-03', '2021-02-06'], 4),
'score': np.linspace(0, 1, 8)})
a['date'] = pd.to_datetime(a['date'])
b = pd.DataFrame({'id': np.array([1, 3, 4, 9]*5),
'date': np.repeat(['2021-01-03', '2021-01-15', '2021-01-23', '2021-02-08', '2021-02-17'], 4),
'value': np.linspace(0, 1, 20)})
b['date'] = pd.to_datetime(b['date'])
I want to merge the two frames, by matching the ids and the date in b
with the most recent date in a
, so in this example I want the following pairings of the dates for the merge:
b -> a
2021-01-03 -> 2021-01-03
2021-01-15 -> 2021-01-03
2021-01-23 -> 2021-01-03
2021-02-08 -> 2021-02-06
2021-02-17 -> 2021-02-06
I can do this with a for-loop over each of the dates in a
, selecting the data in b
that lies between each pair of adjacent dates in a
, adding the score
from a
as a new column, and then concatenating these frames together, but is there a faster way to do this?
Use merge_asof
by on
and by
parameters:
df = pd.merge_asof(b, a, on='date', by='id')
For test was renamed column to date1
:
a = pd.DataFrame({'id': np.array([1, 3, 4, 9]*2),
'date': np.repeat(['2021-01-03', '2021-02-06'], 4),
'score': np.linspace(0, 1, 8)})
a['date'] = pd.to_datetime(a['date'])
b = pd.DataFrame({'id': np.array([1, 3, 4, 9]*5),
'date1': np.repeat(['2021-01-03', '2021-01-15', '2021-01-23', '2021-02-08', '2021-02-17'], 4),
'value': np.linspace(0, 1, 20)})
b['date1'] = pd.to_datetime(b['date1'])
df = pd.merge_asof(b, a, left_on='date1', right_on='date', by='id')
print (df)
id date1 value date score
0 1 2021-01-03 0.000000 2021-01-03 0.000000
1 3 2021-01-03 0.052632 2021-01-03 0.142857
2 4 2021-01-03 0.105263 2021-01-03 0.285714
3 9 2021-01-03 0.157895 2021-01-03 0.428571
4 1 2021-01-15 0.210526 2021-01-03 0.000000
5 3 2021-01-15 0.263158 2021-01-03 0.142857
6 4 2021-01-15 0.315789 2021-01-03 0.285714
7 9 2021-01-15 0.368421 2021-01-03 0.428571
8 1 2021-01-23 0.421053 2021-01-03 0.000000
9 3 2021-01-23 0.473684 2021-01-03 0.142857
10 4 2021-01-23 0.526316 2021-01-03 0.285714
11 9 2021-01-23 0.578947 2021-01-03 0.428571
12 1 2021-02-08 0.631579 2021-02-06 0.571429
13 3 2021-02-08 0.684211 2021-02-06 0.714286
14 4 2021-02-08 0.736842 2021-02-06 0.857143
15 9 2021-02-08 0.789474 2021-02-06 1.000000
16 1 2021-02-17 0.842105 2021-02-06 0.571429
17 3 2021-02-17 0.894737 2021-02-06 0.714286
18 4 2021-02-17 0.947368 2021-02-06 0.857143
19 9 2021-02-17 1.000000 2021-02-06 1.000000
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.