繁体   English   中英

熊猫按组和排序多索引

[英]Pandas Sort Multiindex by Group Sum

给定以下数据框:

import pandas as pd
df=pd.DataFrame({'County':['A','B','C','D','A','B','C','D','A','B','C','D','A','B','C','D','A','B'],
                'Hospital':['a','b','c','d','e','a','b','c','e','a','b','c','d','e','a','b','c','e'],
                'Enrollment':[44,55,42,57,95,54,27,55,81,54,65,23,89,76,34,12,1,67],
                'Year':['2012','2012','2012','2012','2012','2012','2012','2012','2012','2013',
                        '2013','2013','2013','2013','2013','2013','2013','2013']})
d2=pd.pivot_table(df,index=['County','Hospital'],columns=['Year'])#.sort_columns

d2
        Enrollment
        Year   2012     2013
County  Hospital        
A       a      44.0     NaN
        c      NaN      1.0
        d      NaN      89.0
        e      88.0     NaN
B       a      54.0     54.0
        b      55.0     NaN
        e      NaN      71.5
C       a      NaN      34.0
        b      27.0     65.0
        c      42.0     NaN
D       b      NaN      12.0
        c      55.0     23.0
        d      57.0     NaN

我需要对数据框进行排序,以使County按最近一年的注册总数进行降序排序(我想避免直接使用“ 2013”​​),如下所示:

        Enrollment  
    Year          2012  2013
County  Hospital        
B       a         54    54
        b         55    NaN
        e         NaN   71.5
C       a         NaN   34
        b         27    65
        c         42    NaN
A       a         44    NaN
        c         NaN   1
        d         NaN   89
        e         88    NaN
D       b         NaN   12
        c         55    23
        d         57    NaN

然后,我希望每个县的各医院按降序排列,但2013年的入学人数是这样的:

        Enrollment  
        Year    2012    2013
County  Hospital        
B       e       NaN 71.5
        a       54  54
        b       55  NaN
C       b       27  65
        a       NaN 34
        c       42  NaN
A       d       NaN 89
        c       NaN 1
        a       44  NaN
        e       88  NaN
D       c       55  23
        b       NaN 12
        d       57  NaN

到目前为止,我已经尝试使用groupby来获取总和并合并后面的内容,但是还没有运气:

d2.groupby('County').sum()

提前致谢!

你可以:

max_col = max(d2.columns.get_level_values(1)) # get column 2013
d2['sum'] = d2.groupby(level='County').transform('sum').loc[:, ('Enrollment', max_col)]
d2 = d2.sort_values(['sum', ('Enrollment', max_col)], ascending=[False, False])

要得到:

                Enrollment          sum
Year                  2012  2013       
County Hospital                        
B      e               NaN  71.5  125.5
       a              54.0  54.0  125.5
       b              55.0   NaN  125.5
C      b              27.0  65.0   99.0
       a               NaN  34.0   99.0
       c              42.0   NaN   99.0
A      d               NaN  89.0   90.0
       c               NaN   1.0   90.0
       a              44.0   NaN   90.0
       e              88.0   NaN   90.0
D      c              55.0  23.0   35.0
       b               NaN  12.0   35.0
       d              57.0   NaN   35.0

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM