简体   繁体   中英

How do you identify closest date in group to another date without going over between two Pandas DataFrames?

I have two tables I need to join using an id + date combo key.

Table A

ID DateA
123 2020-11-19 17:54:42.253000
123 2020-11-19 15:54:09.434000
456 2020-11-18 16:32:24.653000
456 2020-11-18 15:54:11.816000

Table B

ID DateB
123 2020-11-20 00:02:14.324400
123 2020-11-20 08:22:39.472900
456 2020-11-18 17:11:41.572900
456 2020-11-18 16:13:55.928000

But as you can see the dates aren't exactly the same. In order to know which date is the correct one I need to find out which one of DateA's is closest to the DateB's (of the same ID) without going over (price-it-right rules). For example the first row in TableA would match to the first row in TableB because the IDs match and DataA's value is the closest to that DateB without going over it.

I'm working on an .apply() function for TableA grouped by ID. But the only way to do this seems to be two.loc lookups and a nested loop to find the results. Are there any built-in methods that I'm missing that might make this more efficient?

You can try merge_asof with direction=nearest :

pd.merge_asof(df1.sort_values('DateA'), df2.sort_values('DateB'),
              left_on='DateA', right_on='DateB', by='ID', direction='nearest')\
  .sort_values('ID')

    ID                   DateA                      DateB
2  123 2020-11-19 15:54:09.434 2020-11-20 00:02:14.324400
3  123 2020-11-19 17:54:42.253 2020-11-20 00:02:14.324400
0  456 2020-11-18 15:54:11.816 2020-11-18 16:13:55.928000
1  456 2020-11-18 16:32:24.653 2020-11-18 16:13:55.928000

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM