简体   繁体   中英

I want to merge two pandas pivot tables but am having trouble maintaining columns

So I'm creating two separate pandas pivot tables with the code below

df = pd.read_excel('Report.xlsx')

df_past = df[df['Delivery'] < today]
df_future=df[df['Delivery'] > today]

table_past = pd.pivot_table(
    data=df_past,
    values=['Order', 'Prod'],
    index=['Sales Order', 'Name', 'Delivery'],
    aggfunc={'Order':np.sum, 'Prod':np.sum}
    )

table_future = pd.pivot_table(
    data=df_future,
    values=['Order', 'Prod'],
    index=['Sales Order', 'Name', 'Delivery'],
    aggfunc={'Order':np.sum, 'Prod':np.sum}
    )

Which produces the following two tables.

                                            Order          Prod
Sales Order Name     Delivery                             
B11156456   Amazon   2020-02-18                19             2
B11164868   Google   2020-02-19                10             3
B11165869   Facebook 2020-02-15               130             0

                                            Order          Prod
Sales Order Name     Delivery                             
B11164868   Google   2020-02-27                15             9
B11165869   Facebook 2020-02-24                94            15
B11167123   Tesla    2020-02-27               365            69
B11168132   Samsung  2020-02-28               285            57
B11169563   Lenovo   2020-03-01               105             7

so I want to then merge these two pivot tables on the Order Number with the following code

final_table = table_past.merge(table_future, 
            on=['Sales Order', 'Name'],
            suffixes=('_past', '_future'),
            how='inner'
            )

So this works, but I am not able to maintain the Delivery column. Due to it being an index instead of a value. But I can't use it as a value field because i'm not using any aggfunc on it. So it gives me the info below which has everything I need except the Delivery.

                      Order Balance_past  Prod Balance_past  Order Balance_future  Prod Balance_future
Sales Order Name                                                                                      
B11156456   Amazon                    19                  2                   NaN                  NaN
B11164868   Google                    10                  3                  15.0                  9.0
B11165869   Facebook                 130                  0                  94.0                 15.0

My goal is to have the information displayed as close as possible to the below

                                            Order         Prod 
Sales Order Name     Delivery                             
B11156456   Amazon   2020-02-18                19             2
B11164868   Google   2020-02-19                10             3
                     2020-02-27                15             9
B11165869   Facebook 2020-02-15               130             0
                     2020-02-24                94            15
B11167123   Tesla    2020-02-27               365            69
B11168132   Samsung  2020-02-28               285            57
B11169563   Lenovo   2020-03-01               105             7

But I can't do this directly because I need to ONLY include the second Delivery date for each Sales Order if the first Delivery date is today or earlier. So for example I don't want to include Sales Order number B11169563 because it has one date that is in the future, but i do want to include Sales Order number B11164868 because the first date is prior to today and the second date is in the future. And I also want to include B11156456 because it has one date and that is in the past. Or for more clarity. If there is one delivery date it must be in the past. If there are two delivery dates, then one must be in the past and one must be in the future.

Your requirement becomes easy if you use pandas.reset_index() and pandas.set_index() functions on data frames of pivot_table.

Merge statement does not consider the indexes that are not passed here- on=['Sales Order', 'Name'] Use reset_index() before the merge statement and set_index() after the merge statement. Please consider below changes to your code.

df = pd.read_excel('Report.xlsx')
df_past = df[df['Delivery'] < today]
df_future=df[df['Delivery'] > today]

table_past = pd.pivot_table(
    data=df_past,
    values=['Order', 'Prod'],
    index=['Sales Order', 'Name', 'Delivery'],
    aggfunc={'Order':np.sum, 'Prod':np.sum}
    )

table_future = pd.pivot_table(
    data=df_future,
    values=['Order', 'Prod'],
    index=['Sales Order', 'Name', 'Delivery'],
    aggfunc={'Order':np.sum, 'Prod':np.sum}
    )
**table_past.reset_index(inplace=True)**
**table_future.reset_index(inplace=True)**

final_table = table_past.merge(table_future, 
            on=['Sales Order', 'Name'],
            suffixes=('_past', '_future'),
            how='inner'
            )
**final_table.set_index(['Sales Order', 'Name', 'Delivery'])**

Hope this works.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM