简体   繁体   中英

Merging two df based on dates if between some range and average the values

    df_A
    start_date  end_date
0   2017-03-01  2017-04-20
1   2017-03-20  2017-04-27
2   2017-04-10  2017-05-25
3   2017-04-17  2017-05-22

    df_B
    event_date  price
0   2017-03-15  100
1   2017-02-22  200
2   2017-04-30  100
3   2017-05-20  150
4   2017-05-23  150

Result

    start_date  end_date        avg.price
0   2017-03-01  2017-04-20      100.0
1   2017-03-20  2017-04-27      
2   2017-04-10  2017-05-25      133.3
3   2017-04-17  2017-05-22      125

One way if your dataframes aren't big, is to use cartesian product and filter dataframes.

mapper = df_A.assign(key=1).merge(df_B.assign(key=1))\
             .query('start_date <= event_date <= end_date')\
             .groupby('start_date')['price'].mean()
df_A['avg.price'] = df_A['start_date'].map(mapper)
print(df_A)

Output:

  start_date   end_date   avg.price
0 2017-03-01 2017-04-20  100.000000
1 2017-03-20 2017-04-27         NaN
2 2017-04-10 2017-05-25  133.333333
3 2017-04-17 2017-05-22  125.000000

Otherwise see this so post

conditional_join from pyjanitor may be helpful in the abstraction/convenience; the function is currently in dev:

# pip install git+https://github.com/pyjanitor-devs/pyjanitor.git
import pandas as pd
import janitor
(df_B.conditional_join(
         df_A, 
         ('event_date', 'start_date', '>='), 
         ('event_date', 'end_date', '<='), 
         how = 'right')
    .droplevel(level = 0, axis = 1)
    .loc[:, ['price', 'start_date', 'end_date']]
    .groupby(['start_date', 'end_date'])
    .agg(avg_price = ('price', 'mean'))
)
                        avg_price
start_date end_date
2017-03-01 2017-04-20  100.000000
2017-03-20 2017-04-27         NaN
2017-04-10 2017-05-25  133.333333
2017-04-17 2017-05-22  125.000000

Under the hood it uses a binary search (np.searchsorted) to avoid the Cartesian product. If your intervals were not overlapping, a pd.IntervalIndex would be a more efficient option.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM