简体   繁体   English

我想合并两个 Pandas 数据透视表,但在维护列时遇到问题

[英]I want to merge two pandas pivot tables but am having trouble maintaining columns

So I'm creating two separate pandas pivot tables with the code below所以我用下面的代码创建了两个单独的熊猫数据透视表

df = pd.read_excel('Report.xlsx')

df_past = df[df['Delivery'] < today]
df_future=df[df['Delivery'] > today]

table_past = pd.pivot_table(
    data=df_past,
    values=['Order', 'Prod'],
    index=['Sales Order', 'Name', 'Delivery'],
    aggfunc={'Order':np.sum, 'Prod':np.sum}
    )

table_future = pd.pivot_table(
    data=df_future,
    values=['Order', 'Prod'],
    index=['Sales Order', 'Name', 'Delivery'],
    aggfunc={'Order':np.sum, 'Prod':np.sum}
    )

Which produces the following two tables.这产生以下两个表。

                                            Order          Prod
Sales Order Name     Delivery                             
B11156456   Amazon   2020-02-18                19             2
B11164868   Google   2020-02-19                10             3
B11165869   Facebook 2020-02-15               130             0

                                            Order          Prod
Sales Order Name     Delivery                             
B11164868   Google   2020-02-27                15             9
B11165869   Facebook 2020-02-24                94            15
B11167123   Tesla    2020-02-27               365            69
B11168132   Samsung  2020-02-28               285            57
B11169563   Lenovo   2020-03-01               105             7

so I want to then merge these two pivot tables on the Order Number with the following code所以我想然后将订单号上的这两个数据透视表与以下代码合并

final_table = table_past.merge(table_future, 
            on=['Sales Order', 'Name'],
            suffixes=('_past', '_future'),
            how='inner'
            )

So this works, but I am not able to maintain the Delivery column.所以这有效,但我无法维护 Delivery 列。 Due to it being an index instead of a value.因为它是一个索引而不是一个值。 But I can't use it as a value field because i'm not using any aggfunc on it.但是我不能将它用作值字段,因为我没有在它上面使用任何 aggfunc。 So it gives me the info below which has everything I need except the Delivery.所以它给了我下面的信息,其中包含我需要的一切,除了交付。

                      Order Balance_past  Prod Balance_past  Order Balance_future  Prod Balance_future
Sales Order Name                                                                                      
B11156456   Amazon                    19                  2                   NaN                  NaN
B11164868   Google                    10                  3                  15.0                  9.0
B11165869   Facebook                 130                  0                  94.0                 15.0

My goal is to have the information displayed as close as possible to the below我的目标是让信息显示尽可能接近下面

                                            Order         Prod 
Sales Order Name     Delivery                             
B11156456   Amazon   2020-02-18                19             2
B11164868   Google   2020-02-19                10             3
                     2020-02-27                15             9
B11165869   Facebook 2020-02-15               130             0
                     2020-02-24                94            15
B11167123   Tesla    2020-02-27               365            69
B11168132   Samsung  2020-02-28               285            57
B11169563   Lenovo   2020-03-01               105             7

But I can't do this directly because I need to ONLY include the second Delivery date for each Sales Order if the first Delivery date is today or earlier.但是我不能直接这样做,因为如果第一个交货日期是今天或更早,我只需要为每个销售订单包括第二个交货日期。 So for example I don't want to include Sales Order number B11169563 because it has one date that is in the future, but i do want to include Sales Order number B11164868 because the first date is prior to today and the second date is in the future.例如,我不想包含销售订单号 B11169563,因为它有一个未来的日期,但我确实想包含销售订单号 B11164868,因为第一个日期在今天之前,第二个日期在未来。 And I also want to include B11156456 because it has one date and that is in the past.我还想包括 B11156456,因为它有一个日期,而且是过去的日期。 Or for more clarity.或者为了更清楚。 If there is one delivery date it must be in the past.如果有一个交货日期,它必须是过去的日期。 If there are two delivery dates, then one must be in the past and one must be in the future.如果有两个交货日期,那么一个必须在过去,一个必须在将来。

Your requirement becomes easy if you use pandas.reset_index() and pandas.set_index() functions on data frames of pivot_table.如果您对pivot_table 的数据框使用pandas.reset_index() 和pandas.set_index() 函数,您的要求就会变得简单。

Merge statement does not consider the indexes that are not passed here- on=['Sales Order', 'Name'] Use reset_index() before the merge statement and set_index() after the merge statement. Merge 语句不考虑这里没有传递的索引- on=['Sales Order', 'Name']在merge 语句之前使用reset_index(),在merge 语句之后使用set_index()。 Please consider below changes to your code.请考虑对您的代码进行以下更改。

df = pd.read_excel('Report.xlsx')
df_past = df[df['Delivery'] < today]
df_future=df[df['Delivery'] > today]

table_past = pd.pivot_table(
    data=df_past,
    values=['Order', 'Prod'],
    index=['Sales Order', 'Name', 'Delivery'],
    aggfunc={'Order':np.sum, 'Prod':np.sum}
    )

table_future = pd.pivot_table(
    data=df_future,
    values=['Order', 'Prod'],
    index=['Sales Order', 'Name', 'Delivery'],
    aggfunc={'Order':np.sum, 'Prod':np.sum}
    )
**table_past.reset_index(inplace=True)**
**table_future.reset_index(inplace=True)**

final_table = table_past.merge(table_future, 
            on=['Sales Order', 'Name'],
            suffixes=('_past', '_future'),
            how='inner'
            )
**final_table.set_index(['Sales Order', 'Name', 'Delivery'])**

Hope this works.希望这有效。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM