简体   繁体   中英

Pandas pivot Table Multi-Layer Sorting

I have given df: (UPDATED):

import pandas as pd
import numpy as np
df = pd.DataFrame({"A": ["foo", "foo", "foo", "foo", "foo",
                         "bar", "bar", "bar", "bar","zz","zz"],
                  "B": ["one", "one", "one", "two", "two",
                         "one", "one", "two", "two","xy","zz"],
                   "Name":["Peter", "Amy", "Brian", "Amy", "Amy",
                         "Peter", "Brian", "Peter", "Brian","Brian","Brian"],
                  "Year": [2019, 2019, 2019, 2019,
                         2019, 2019, 2020, 2020,
                          2020,2019,2020],
                  "Values": [20, 4, 20, 5, 6, 6, 8, 9, 9,10,5]})
df_pivot = pd.pivot_table(df, values='Values', index=['Name','A', 'B'],
                    columns=['Year'], aggfunc=np.sum, fill_value=0, margins=True, margins_name="Totals")

Once I pivot it in a way I like it looks like this:

 Year            2019  2020  Totals
Name   A   B                      
Amy    foo one     4     0       4
           two    11     0      11
Brian  bar one     0     8       8
           two     0     9       9
       foo one    20     0      20
       zz  xy     10     0      10
           zz      0     5       5
Peter  bar one     6     0       6
           two     0     9       9
       foo one    20     0      20
Totals            71    31     102

Now the "fun" part begins..

I would like this df pivot table to be sorted on all index columns from left to right based on sum of values.

Let me explain.

Firstly I would like to sort this pivot table by column "Name" in descending order of "Totals" for each name, therefore I would calculate sum for Amy = 15, Brian = 52, Peter= 35. From this I know that first column should be sorted Brian/Peter/Amy.

Now I do the same for second column "A", but first column "Name" is fixed.

ie for name Brian (which is on top) I now calculate totals for column "A" (I want to see whether foo/bar/zz should be first), therefore I calculate that Brian-Foo is equal to 20 and Brian-bar is equal to 8+9 and Brian-zz is 15, therefore we want to have Foo first for Brian in second column... and the same for rest indexed columns.

The output should look like this:

Year            2019  2020  Totals
Name   A   B                      
Brian  foo one    20     0      20 
       bar two     0     9       9
           one     0     8       8
       zz  xy     10     0      10
           zz      0     5       5
Peter  foo one    20     0      20
       bar two     0     9       9
           one     6     0       6
Amy    foo two    11     0      11
           one     4     0       4
Totals            71    31     102

So long story short, firstly I want to sort first column based on totals for items from that column and I want to fix it, then I want to sort second column for items from that column, but grouped as per first sorting etc.

Can you advise how to do this please? I appreciate help a lot!

Thanks Pawel

You can use groupby.transform to get the sum within names, then sort with it:

df_pivot = (df_pivot.iloc[:-1]
              .assign(sort=lambda x: x['Totals'].groupby(level=0).transform('sum'))
              .sort_values(['sort','Name','Totals'], 
                           ascending=[False,True,False], kind='mergesort')
              .drop('sort', axis=1)
              .append(df_pivot.iloc[-1])
           )

Output:

Year            2019  2020  Totals
Name   A   B                      
Brian  foo one    20     0      20
       bar two     0     9       9
           one     0     8       8
Peter  foo one    20     0      20
       bar two     0     9       9
           one     6     0       6
Amy    foo two    11     0      11
           one     4     0       4
Totals            61    26      87

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM