I have two dataframes.
The first is a dataframe of customers with an accompanying month within which a shipment must be fulfilled.
The second is a dataframe that contains all possible combinations of dates within a horizon, and customers. For example, a three-day horizon combination, with one customer, 'ABC' starting '2020-01-01' would look like.
Date Customer
2020-01-01 'ABC'
2020-01-02 'ABC'
2020-01-03 'ABC'
I am trying to join the below two dateframes such that I get a combination of customer:dates such that the dates can only occur WITHIN the delivery month.
df_a.head(5)
>>> month, client
2020-01 'ABC'
'DEF'
2020-02 'GHI'
'JKL'
'MNO'
2020-03 'PQR'
df_b.head(5)
>>> dates client
'2020-01-01' 'ABC'
'2020-01-01' 'DEF'
'2020-01-02' 'ABC'
'2020-01-02' 'DEF'
'2020-01-03' 'ABC'
'2020-01-03' 'DEF'
Desired output:
df_joined.head(5)
customer dates
'ABC' 2020-01-01
'ABC' 2020-01-02
'ABC' 2020-01-03
'DEF' 2020-01-01
'DEF' 2020-01-02
'DEF' 2020-01-03
'GHI' 2020-02-01
'GHI' 2020-02-02
'GHI' 2020-02-03
'JKL' 2020-02-01
'JKL' 2020-02-02
'JKL' 2020-02-03
I have attempted to accomplish this with merge
and query
ie.
ship_dates = df1.merge(df2, left_on='dates', right_on='client')\
.query('dates >= month')\
.set_index(['customer','dates'])
but I receiving a KeyError for dates.
All help greatly appreciated!
Managed to find a solution.
I created a month:year column in each dataframe:
df1['mnth_year'] = pd.to_datetime(df1['dates']).dt.strftime('%B-%Y')
df2['month_year'] = pd.to_datetime(df2['month']).dt.strftime('%B-%Y')
then merged, using.query() equating mnth_yr
with month_year
:
dates = df1.merge(df2, how='inner', left_on='customers',
right_on='customer')\
.query('mnth_yr == month_year')\
.set_index(['customer', 'dates'])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.