简体   繁体   English

熊猫:聚合后维护专栏

[英]Pandas: Maintain a column after aggregation

I have a data look like as below: 我的数据如下图所示:

在此处输入图片说明

The code used to build it is as follows: 用于构建它的代码如下:

  Data = pd.DataFrame({'Customer_ID':[1,2,3,4,5,1,2,3,4,5],
                 'Product_ID':['A','D','C','A','E','B','D','C','B','E'],
                 'SalesAmount':[12.34,13.55,34.00, 19.15,13.22,12.34,13.55,34.00, 19.15,13.22],
                     'ProductCost' : [12.34,13.55,34.00, 19.15,13.22,12.34,13.55,34.00, 19.15,13.22]})

My question is, how would I maintain a column after aggregation of columns needed ? 我的问题是,汇总所需的列后如何维护列?

In my case I want to have the column Product_ID in the data after aggregation. 就我而言,我希望聚合后的数据中包含列Product_ID。 The code and the result I used to aggregate are as follows: 我用来汇总的代码和结果如下:

 Data_aggr = Data.groupby('Customer_ID').agg({'SalesAmount':[min,sum],
                                         'ProductCost':['mean',max] },'Product_ID')

 Data_aggr.columns = ["_".join(x) for x in Data_aggr.columns.ravel()]

 Data_aggr.index.name='Customer_ID'

 Data_aggr.reset_index(inplace=True)
 Data_aggr

Result: 结果:

在此处输入图片说明

My desired output is : 我想要的输出是:

在此处输入图片说明

You need aggregate all necessary columns, eg by first : 您需要汇总所有必要的列,例如按first

Data_aggr = Data.groupby('Customer_ID').agg({'SalesAmount':[min,sum],
                                         'ProductCost':['mean',max],'Product_ID':'first' })

Data_aggr.columns = ["_".join(x) for x in Data_aggr.columns.ravel()]

Data_aggr = Data_aggr.rename(columns={'Product_ID_first':'Product_ID'}).reset_index()
print (Data_aggr)
   Customer_ID  SalesAmount_min  SalesAmount_sum  ProductCost_mean  \
0            1            12.34            24.68             12.34   
1            2            13.55            27.10             13.55   
2            3            34.00            68.00             34.00   
3            4            19.15            38.30             19.15   
4            5            13.22            26.44             13.22   

   ProductCost_max Product_ID  
0            12.34          A  
1            13.55          D  
2            34.00          C  
3            19.15          A  
4            13.22          E  

Or grouping by multiple columns, but output is different: 或按多列分组,但输出不同:

Data_aggr = Data.groupby(['Customer_ID','Product_ID']).agg({'SalesAmount':[min,sum],
                                         'ProductCost':['mean',max]})

Data_aggr.columns = ["_".join(x) for x in Data_aggr.columns.ravel()]

Data_aggr = Data_aggr.reset_index()
print (Data_aggr)
   Customer_ID Product_ID  SalesAmount_min  SalesAmount_sum  ProductCost_mean  \
0            1          A            12.34            12.34             12.34   
1            1          B            12.34            12.34             12.34   
2            2          D            13.55            27.10             13.55   
3            3          C            34.00            68.00             34.00   
4            4          A            19.15            19.15             19.15   
5            4          B            19.15            19.15             19.15   
6            5          E            13.22            26.44             13.22   

   ProductCost_max  
0            12.34  
1            12.34  
2            13.55  
3            34.00  
4            19.15  
5            19.15  
6            13.22  

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM