Python pandas groupby聚合在多个列上，然后是pivot

Question

In Python, I have a pandas DataFrame similar to the following: 在Python中，我有一个类似于以下的pandas DataFrame：

Item | shop1 | shop2 | shop3 | Category
------------------------------------
Shoes| 45    | 50    | 53    | Clothes
TV   | 200   | 300   | 250   | Technology
Book | 20    | 17    | 21    | Books
phone| 300   | 350   | 400   | Technology

Where shop1, shop2 and shop3 are the costs of every item in different shops. shop1，shop2和shop3是不同商店中每件商品的成本。 Now, I need to return a DataFrame, after some data cleaning, like this one: 现在，我需要在一些数据清理后返回一个DataFrame，如下所示：

Category (index)| size| sum| mean | std
----------------------------------------

where size is the number of items in each Category and sum, mean and std are related to the same functions applied to the 3 shops. 其中size是每个Category中的项目数和sum，mean和std与应用于3个商店的相同功能相关。 How can I do these operations with the split-apply-combine pattern (groupby, aggregate, apply,...) ? 如何使用split-apply-combine模式（groupby，aggregate，apply，...）执行这些操作？

Can someone help me out? 有人可以帮我吗？ I'm going crazy with this one...thank you! 我对这个疯狂了......谢谢！

Answer 1

Edited for Pandas 0.22+ considering the deprecation of the use of dictionaries in a group by aggregation. 编辑Pandas 0.22+考虑通过聚合弃用组中的词典。

We set up a very similar dictionary where we use the keys of the dictionary to specify our functions and the dictionary itself to rename the columns. 我们建立了一个非常相似的字典，我们使用字典的键来指定我们的函数，使用字典本身来重命名列。

rnm_cols = dict(size='Size', sum='Sum', mean='Mean', std='Std')
df.set_index(['Category', 'Item']).stack().groupby('Category') \
  .agg(rnm_cols.keys()).rename(columns=rnm_cols)

            Size   Sum        Mean        Std
Category                                     
Books          3    58   19.333333   2.081666
Clothes        3   148   49.333333   4.041452
Technology     6  1800  300.000000  70.710678

option 1 选项1
use agg ← link to docs 使用agg ←链接到docs

agg_funcs = dict(Size='size', Sum='sum', Mean='mean', Std='std')
df.set_index(['Category', 'Item']).stack().groupby(level=0).agg(agg_funcs)

                  Std   Sum        Mean  Size
Category                                     
Books        2.081666    58   19.333333     3
Clothes      4.041452   148   49.333333     3
Technology  70.710678  1800  300.000000     6

option 2 选项2
more for less 更多，更少
use describe ← link to docs 使用describe ←链接到docs

df.set_index(['Category', 'Item']).stack().groupby(level=0).describe().unstack()

            count        mean        std    min    25%    50%    75%    max
Category                                                                   
Books         3.0   19.333333   2.081666   17.0   18.5   20.0   20.5   21.0
Clothes       3.0   49.333333   4.041452   45.0   47.5   50.0   51.5   53.0
Technology    6.0  300.000000  70.710678  200.0  262.5  300.0  337.5  400.0

Answer 2

df.groupby('Category').agg({'Item':'size','shop1':['sum','mean','std'],'shop2':['sum','mean','std'],'shop3':['sum','mean','std']})

Or if you want it across all shops then: 或者，如果您想在所有商店中使用它，那么：

df1 = df.set_index(['Item','Category']).stack().reset_index().rename(columns={'level_2':'Shops',0:'costs'})
df1.groupby('Category').agg({'Item':'size','costs':['sum','mean','std']})

Answer 3

If I understand correctly, you want to calculate aggregate metrics for all shops, not for each individually. 如果我理解正确，您希望计算所有商店的汇总指标，而不是单独计算每个商店的汇总指标。 To do that, you can first stack your dataframe and then group by Category : 为此，您可以先stack数据帧，然后按Category分组：

stacked = df.set_index(['Item', 'Category']).stack().reset_index()
stacked.columns = ['Item', 'Category', 'Shop', 'Price']
stacked.groupby('Category').agg({'Price':['count','sum','mean','std']})

Which results in 结果如何

           Price                             
           count   sum        mean        std
Category                                     
Books          3    58   19.333333   2.081666
Clothes        3   148   49.333333   4.041452
Technology     6  1800  300.000000  70.710678

Python pandas groupby聚合在多个列上，然后是pivot

问题描述

3 个解决方案

解决方案1
17 已采纳 2017-04-02 23:27:18

解决方案2
10 2017-04-02 20:30:13

解决方案3
0 2017-04-02 20:40:58

Python pandas groupby聚合在多个列上，然后是pivot

问题描述

3 个解决方案

解决方案1 17 已采纳 2017-04-02 23:27:18

解决方案2 10 2017-04-02 20:30:13

解决方案3 0 2017-04-02 20:40:58

解决方案1
17 已采纳 2017-04-02 23:27:18

解决方案2
10 2017-04-02 20:30:13

解决方案3
0 2017-04-02 20:40:58