How do you identify closest date in group to another date without going over between two Pandas DataFrames?

Question

I have two tables I need to join using an id + date combo key.

Table A

ID	DateA
123	2020-11-19 17:54:42.253000
123	2020-11-19 15:54:09.434000
456	2020-11-18 16:32:24.653000
456	2020-11-18 15:54:11.816000

Table B

ID	DateB
123	2020-11-20 00:02:14.324400
123	2020-11-20 08:22:39.472900
456	2020-11-18 17:11:41.572900
456	2020-11-18 16:13:55.928000

But as you can see the dates aren't exactly the same. In order to know which date is the correct one I need to find out which one of DateA's is closest to the DateB's (of the same ID) without going over (price-it-right rules). For example the first row in TableA would match to the first row in TableB because the IDs match and DataA's value is the closest to that DateB without going over it.

I'm working on an .apply() function for TableA grouped by ID. But the only way to do this seems to be two.loc lookups and a nested loop to find the results. Are there any built-in methods that I'm missing that might make this more efficient?

Answer 1

You can try merge_asof with direction=nearest :

pd.merge_asof(df1.sort_values('DateA'), df2.sort_values('DateB'),
              left_on='DateA', right_on='DateB', by='ID', direction='nearest')\
  .sort_values('ID')

    ID                   DateA                      DateB
2  123 2020-11-19 15:54:09.434 2020-11-20 00:02:14.324400
3  123 2020-11-19 17:54:42.253 2020-11-20 00:02:14.324400
0  456 2020-11-18 15:54:11.816 2020-11-18 16:13:55.928000
1  456 2020-11-18 16:32:24.653 2020-11-18 16:13:55.928000

How do you identify closest date in group to another date without going over between two Pandas DataFrames?

Question

1 answers

solution1
1 ACCPTED 2020-12-12 17:11:55

How do you identify closest date in group to another date without going over between two Pandas DataFrames?

Question

1 answers

solution1 1 ACCPTED 2020-12-12 17:11:55

solution1
1 ACCPTED 2020-12-12 17:11:55