我想合并两个 Pandas 数据透视表，但在维护列时遇到问题

Question

所以我用下面的代码创建了两个单独的熊猫数据透视表

df = pd.read_excel('Report.xlsx')

df_past = df[df['Delivery'] < today]
df_future=df[df['Delivery'] > today]

table_past = pd.pivot_table(
    data=df_past,
    values=['Order', 'Prod'],
    index=['Sales Order', 'Name', 'Delivery'],
    aggfunc={'Order':np.sum, 'Prod':np.sum}
    )

table_future = pd.pivot_table(
    data=df_future,
    values=['Order', 'Prod'],
    index=['Sales Order', 'Name', 'Delivery'],
    aggfunc={'Order':np.sum, 'Prod':np.sum}
    )

这产生以下两个表。

                                            Order          Prod
Sales Order Name     Delivery                             
B11156456   Amazon   2020-02-18                19             2
B11164868   Google   2020-02-19                10             3
B11165869   Facebook 2020-02-15               130             0

                                            Order          Prod
Sales Order Name     Delivery                             
B11164868   Google   2020-02-27                15             9
B11165869   Facebook 2020-02-24                94            15
B11167123   Tesla    2020-02-27               365            69
B11168132   Samsung  2020-02-28               285            57
B11169563   Lenovo   2020-03-01               105             7

所以我想然后将订单号上的这两个数据透视表与以下代码合并

final_table = table_past.merge(table_future, 
            on=['Sales Order', 'Name'],
            suffixes=('_past', '_future'),
            how='inner'
            )

所以这有效，但我无法维护 Delivery 列。 因为它是一个索引而不是一个值。 但是我不能将它用作值字段，因为我没有在它上面使用任何 aggfunc。 所以它给了我下面的信息，其中包含我需要的一切，除了交付。

                      Order Balance_past  Prod Balance_past  Order Balance_future  Prod Balance_future
Sales Order Name                                                                                      
B11156456   Amazon                    19                  2                   NaN                  NaN
B11164868   Google                    10                  3                  15.0                  9.0
B11165869   Facebook                 130                  0                  94.0                 15.0

我的目标是让信息显示尽可能接近下面

                                            Order         Prod 
Sales Order Name     Delivery                             
B11156456   Amazon   2020-02-18                19             2
B11164868   Google   2020-02-19                10             3
                     2020-02-27                15             9
B11165869   Facebook 2020-02-15               130             0
                     2020-02-24                94            15
B11167123   Tesla    2020-02-27               365            69
B11168132   Samsung  2020-02-28               285            57
B11169563   Lenovo   2020-03-01               105             7

但是我不能直接这样做，因为如果第一个交货日期是今天或更早，我只需要为每个销售订单包括第二个交货日期。 例如，我不想包含销售订单号 B11169563，因为它有一个未来的日期，但我确实想包含销售订单号 B11164868，因为第一个日期在今天之前，第二个日期在未来。 我还想包括 B11156456，因为它有一个日期，而且是过去的日期。 或者为了更清楚。 如果有一个交货日期，它必须是过去的日期。 如果有两个交货日期，那么一个必须在过去，一个必须在将来。

Answer 1

如果您对pivot_table 的数据框使用pandas.reset_index() 和pandas.set_index() 函数，您的要求就会变得简单。

Merge 语句不考虑这里没有传递的索引- on=['Sales Order', 'Name']在merge 语句之前使用reset_index()，在merge 语句之后使用set_index()。 请考虑对您的代码进行以下更改。

df = pd.read_excel('Report.xlsx')
df_past = df[df['Delivery'] < today]
df_future=df[df['Delivery'] > today]

table_past = pd.pivot_table(
    data=df_past,
    values=['Order', 'Prod'],
    index=['Sales Order', 'Name', 'Delivery'],
    aggfunc={'Order':np.sum, 'Prod':np.sum}
    )

table_future = pd.pivot_table(
    data=df_future,
    values=['Order', 'Prod'],
    index=['Sales Order', 'Name', 'Delivery'],
    aggfunc={'Order':np.sum, 'Prod':np.sum}
    )
**table_past.reset_index(inplace=True)**
**table_future.reset_index(inplace=True)**

final_table = table_past.merge(table_future, 
            on=['Sales Order', 'Name'],
            suffixes=('_past', '_future'),
            how='inner'
            )
**final_table.set_index(['Sales Order', 'Name', 'Delivery'])**

希望这有效。

我想合并两个 Pandas 数据透视表，但在维护列时遇到问题

问题描述

1 个解决方案

解决方案1
0 2020-02-21 03:32:49

如果您对pivot_table 的数据框使用pandas.reset_index() 和pandas.set_index() 函数，您的要求就会变得简单。

我想合并两个 Pandas 数据透视表，但在维护列时遇到问题

问题描述

1 个解决方案

解决方案1 0 2020-02-21 03:32:49

如果您对pivot_table 的数据框使用pandas.reset_index() 和pandas.set_index() 函数，您的要求就会变得简单。

解决方案1
0 2020-02-21 03:32:49