简体   繁体   中英

Pandas Sort Multiindex by Group Sum

Given the following data frame:

import pandas as pd
df=pd.DataFrame({'County':['A','B','C','D','A','B','C','D','A','B','C','D','A','B','C','D','A','B'],
                'Hospital':['a','b','c','d','e','a','b','c','e','a','b','c','d','e','a','b','c','e'],
                'Enrollment':[44,55,42,57,95,54,27,55,81,54,65,23,89,76,34,12,1,67],
                'Year':['2012','2012','2012','2012','2012','2012','2012','2012','2012','2013',
                        '2013','2013','2013','2013','2013','2013','2013','2013']})
d2=pd.pivot_table(df,index=['County','Hospital'],columns=['Year'])#.sort_columns

d2
        Enrollment
        Year   2012     2013
County  Hospital        
A       a      44.0     NaN
        c      NaN      1.0
        d      NaN      89.0
        e      88.0     NaN
B       a      54.0     54.0
        b      55.0     NaN
        e      NaN      71.5
C       a      NaN      34.0
        b      27.0     65.0
        c      42.0     NaN
D       b      NaN      12.0
        c      55.0     23.0
        d      57.0     NaN

I need to sort the data frame such that County is sorted descendingly by the sum of Enrollment for the most recent year (I want to avoid using '2013' directly) like this:

        Enrollment  
    Year          2012  2013
County  Hospital        
B       a         54    54
        b         55    NaN
        e         NaN   71.5
C       a         NaN   34
        b         27    65
        c         42    NaN
A       a         44    NaN
        c         NaN   1
        d         NaN   89
        e         88    NaN
D       b         NaN   12
        c         55    23
        d         57    NaN

Then, I'd like each hospital sorted within each county, descendingly, but 2013 enrollments like this:

        Enrollment  
        Year    2012    2013
County  Hospital        
B       e       NaN 71.5
        a       54  54
        b       55  NaN
C       b       27  65
        a       NaN 34
        c       42  NaN
A       d       NaN 89
        c       NaN 1
        a       44  NaN
        e       88  NaN
D       c       55  23
        b       NaN 12
        d       57  NaN

So far, I've tried using groupby to get the sums and merge the back but have not had any luck:

d2.groupby('County').sum()

Thanks in advance!

You could:

max_col = max(d2.columns.get_level_values(1)) # get column 2013
d2['sum'] = d2.groupby(level='County').transform('sum').loc[:, ('Enrollment', max_col)]
d2 = d2.sort_values(['sum', ('Enrollment', max_col)], ascending=[False, False])

to get:

                Enrollment          sum
Year                  2012  2013       
County Hospital                        
B      e               NaN  71.5  125.5
       a              54.0  54.0  125.5
       b              55.0   NaN  125.5
C      b              27.0  65.0   99.0
       a               NaN  34.0   99.0
       c              42.0   NaN   99.0
A      d               NaN  89.0   90.0
       c               NaN   1.0   90.0
       a              44.0   NaN   90.0
       e              88.0   NaN   90.0
D      c              55.0  23.0   35.0
       b               NaN  12.0   35.0
       d              57.0   NaN   35.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM