I have two tables I need to join using an id + date combo key.
Table A
ID | DateA |
---|---|
123 | 2020-11-19 17:54:42.253000 |
123 | 2020-11-19 15:54:09.434000 |
456 | 2020-11-18 16:32:24.653000 |
456 | 2020-11-18 15:54:11.816000 |
Table B
ID | DateB |
---|---|
123 | 2020-11-20 00:02:14.324400 |
123 | 2020-11-20 08:22:39.472900 |
456 | 2020-11-18 17:11:41.572900 |
456 | 2020-11-18 16:13:55.928000 |
But as you can see the dates aren't exactly the same. In order to know which date is the correct one I need to find out which one of DateA's is closest to the DateB's (of the same ID) without going over (price-it-right rules). For example the first row in TableA would match to the first row in TableB because the IDs match and DataA's value is the closest to that DateB without going over it.
I'm working on an .apply()
function for TableA grouped by ID. But the only way to do this seems to be two.loc lookups and a nested loop to find the results. Are there any built-in methods that I'm missing that might make this more efficient?
You can try merge_asof
with direction=nearest
:
pd.merge_asof(df1.sort_values('DateA'), df2.sort_values('DateB'),
left_on='DateA', right_on='DateB', by='ID', direction='nearest')\
.sort_values('ID')
ID DateA DateB
2 123 2020-11-19 15:54:09.434 2020-11-20 00:02:14.324400
3 123 2020-11-19 17:54:42.253 2020-11-20 00:02:14.324400
0 456 2020-11-18 15:54:11.816 2020-11-18 16:13:55.928000
1 456 2020-11-18 16:32:24.653 2020-11-18 16:13:55.928000
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.