So I'm creating two separate pandas pivot tables with the code below
df = pd.read_excel('Report.xlsx')
df_past = df[df['Delivery'] < today]
df_future=df[df['Delivery'] > today]
table_past = pd.pivot_table(
data=df_past,
values=['Order', 'Prod'],
index=['Sales Order', 'Name', 'Delivery'],
aggfunc={'Order':np.sum, 'Prod':np.sum}
)
table_future = pd.pivot_table(
data=df_future,
values=['Order', 'Prod'],
index=['Sales Order', 'Name', 'Delivery'],
aggfunc={'Order':np.sum, 'Prod':np.sum}
)
Which produces the following two tables.
Order Prod
Sales Order Name Delivery
B11156456 Amazon 2020-02-18 19 2
B11164868 Google 2020-02-19 10 3
B11165869 Facebook 2020-02-15 130 0
Order Prod
Sales Order Name Delivery
B11164868 Google 2020-02-27 15 9
B11165869 Facebook 2020-02-24 94 15
B11167123 Tesla 2020-02-27 365 69
B11168132 Samsung 2020-02-28 285 57
B11169563 Lenovo 2020-03-01 105 7
so I want to then merge these two pivot tables on the Order Number with the following code
final_table = table_past.merge(table_future,
on=['Sales Order', 'Name'],
suffixes=('_past', '_future'),
how='inner'
)
So this works, but I am not able to maintain the Delivery column. Due to it being an index instead of a value. But I can't use it as a value field because i'm not using any aggfunc on it. So it gives me the info below which has everything I need except the Delivery.
Order Balance_past Prod Balance_past Order Balance_future Prod Balance_future
Sales Order Name
B11156456 Amazon 19 2 NaN NaN
B11164868 Google 10 3 15.0 9.0
B11165869 Facebook 130 0 94.0 15.0
My goal is to have the information displayed as close as possible to the below
Order Prod
Sales Order Name Delivery
B11156456 Amazon 2020-02-18 19 2
B11164868 Google 2020-02-19 10 3
2020-02-27 15 9
B11165869 Facebook 2020-02-15 130 0
2020-02-24 94 15
B11167123 Tesla 2020-02-27 365 69
B11168132 Samsung 2020-02-28 285 57
B11169563 Lenovo 2020-03-01 105 7
But I can't do this directly because I need to ONLY include the second Delivery date for each Sales Order if the first Delivery date is today or earlier. So for example I don't want to include Sales Order number B11169563 because it has one date that is in the future, but i do want to include Sales Order number B11164868 because the first date is prior to today and the second date is in the future. And I also want to include B11156456 because it has one date and that is in the past. Or for more clarity. If there is one delivery date it must be in the past. If there are two delivery dates, then one must be in the past and one must be in the future.
Merge statement does not consider the indexes that are not passed here- on=['Sales Order', 'Name'] Use reset_index() before the merge statement and set_index() after the merge statement. Please consider below changes to your code.
df = pd.read_excel('Report.xlsx')
df_past = df[df['Delivery'] < today]
df_future=df[df['Delivery'] > today]
table_past = pd.pivot_table(
data=df_past,
values=['Order', 'Prod'],
index=['Sales Order', 'Name', 'Delivery'],
aggfunc={'Order':np.sum, 'Prod':np.sum}
)
table_future = pd.pivot_table(
data=df_future,
values=['Order', 'Prod'],
index=['Sales Order', 'Name', 'Delivery'],
aggfunc={'Order':np.sum, 'Prod':np.sum}
)
**table_past.reset_index(inplace=True)**
**table_future.reset_index(inplace=True)**
final_table = table_past.merge(table_future,
on=['Sales Order', 'Name'],
suffixes=('_past', '_future'),
how='inner'
)
**final_table.set_index(['Sales Order', 'Name', 'Delivery'])**
Hope this works.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.