熊猫：聚合后维护专栏

Question

I have a data look like as below: 我的数据如下图所示：

The code used to build it is as follows: 用于构建它的代码如下：

  Data = pd.DataFrame({'Customer_ID':[1,2,3,4,5,1,2,3,4,5],
                 'Product_ID':['A','D','C','A','E','B','D','C','B','E'],
                 'SalesAmount':[12.34,13.55,34.00, 19.15,13.22,12.34,13.55,34.00, 19.15,13.22],
                     'ProductCost' : [12.34,13.55,34.00, 19.15,13.22,12.34,13.55,34.00, 19.15,13.22]})

My question is, how would I maintain a column after aggregation of columns needed ? 我的问题是，汇总所需的列后如何维护列？

In my case I want to have the column Product_ID in the data after aggregation. 就我而言，我希望聚合后的数据中包含列Product_ID。 The code and the result I used to aggregate are as follows: 我用来汇总的代码和结果如下：

 Data_aggr = Data.groupby('Customer_ID').agg({'SalesAmount':[min,sum],
                                         'ProductCost':['mean',max] },'Product_ID')

 Data_aggr.columns = ["_".join(x) for x in Data_aggr.columns.ravel()]

 Data_aggr.index.name='Customer_ID'

 Data_aggr.reset_index(inplace=True)
 Data_aggr

Result: 结果：

My desired output is : 我想要的输出是：

Answer 1

You need aggregate all necessary columns, eg by first : 您需要汇总所有必要的列，例如按first ：

Data_aggr = Data.groupby('Customer_ID').agg({'SalesAmount':[min,sum],
                                         'ProductCost':['mean',max],'Product_ID':'first' })

Data_aggr.columns = ["_".join(x) for x in Data_aggr.columns.ravel()]

Data_aggr = Data_aggr.rename(columns={'Product_ID_first':'Product_ID'}).reset_index()
print (Data_aggr)
   Customer_ID  SalesAmount_min  SalesAmount_sum  ProductCost_mean  \
0            1            12.34            24.68             12.34   
1            2            13.55            27.10             13.55   
2            3            34.00            68.00             34.00   
3            4            19.15            38.30             19.15   
4            5            13.22            26.44             13.22   

   ProductCost_max Product_ID  
0            12.34          A  
1            13.55          D  
2            34.00          C  
3            19.15          A  
4            13.22          E

Or grouping by multiple columns, but output is different: 或按多列分组，但输出不同：

Data_aggr = Data.groupby(['Customer_ID','Product_ID']).agg({'SalesAmount':[min,sum],
                                         'ProductCost':['mean',max]})

Data_aggr.columns = ["_".join(x) for x in Data_aggr.columns.ravel()]

Data_aggr = Data_aggr.reset_index()
print (Data_aggr)
   Customer_ID Product_ID  SalesAmount_min  SalesAmount_sum  ProductCost_mean  \
0            1          A            12.34            12.34             12.34   
1            1          B            12.34            12.34             12.34   
2            2          D            13.55            27.10             13.55   
3            3          C            34.00            68.00             34.00   
4            4          A            19.15            19.15             19.15   
5            4          B            19.15            19.15             19.15   
6            5          E            13.22            26.44             13.22   

   ProductCost_max  
0            12.34  
1            12.34  
2            13.55  
3            34.00  
4            19.15  
5            19.15  
6            13.22

熊猫：聚合后维护专栏

问题描述

1 个解决方案

解决方案1
2 已采纳 2018-07-12 10:26:27

熊猫：聚合后维护专栏

问题描述

1 个解决方案

解决方案1 2 已采纳 2018-07-12 10:26:27

解决方案1
2 已采纳 2018-07-12 10:26:27