简体   繁体   中英

How to sort Pandas DataFrame both by MultiIndex and by value?

Sample data:

mdf = pd.DataFrame([[1,2,50],[1,2,20],
                [1,5,10],[2,8,80],
                [2,5,65],[2,8,10]
               ], columns=['src','dst','n']); mdf

    src dst n
0   1   2   50
1   1   2   20
2   1   5   10
3   2   8   80
4   2   5   65
5   2   8   10

groupby() gives a two-level multi-index:

test = mdf.groupby(['src','dst'])['n'].agg(['sum','count']); test

        sum count
src dst 
1   2   70  2
    5   10  1
2   5   65  1
    8   90  2

Question: how to sort this DataFrame by src ascending and then by sum descending?

I'm a beginner with pandas, learned about sort_index() and sort_values(), but in this task it seems that I need both simultaneously.

Expected result , under each "src" sorting is determined by the "sum":

        sum count
src dst 
1   2   70  2
    5   10  1
2   8   90  2
    5   65  1

IIUC:

In [29]: test.sort_values('sum', ascending=False).sort_index(level=0)
Out[29]:
         sum  count
src dst
1   2     80      2
    5     10      1
2   8     80      1

UPDATE: very similar to @anonyXmous's solution :

In [47]: (test.reset_index()
              .sort_values(['src','sum'], ascending=[1,0])
              .set_index(['src','dst']))
Out[47]:
         sum  count
src dst
1   2     70      2
    5     10      1
2   8     90      2
    5     65      1

You can reset the index then sort them by chosen columns. Hope this helps.

import pandas as pd

mdf = pd.DataFrame([[1,2,50],[1,2,20],
                [1,5,10],[2,8,80],
                [2,5,65],[2,8,10]
               ], columns=['src','dst','n']); 
mdf = mdf.groupby(['src','dst'])['n'].agg(['sum','count']); 
mdf.reset_index(inplace=True)
mdf.sort_values(['src', 'sum'], ascending=[True, False], inplace=True)
print(mdf)

Result:
       src dst sum  count
    0   1   2   70   2
    1   1   5   10   1
    3   2   8   90   2
    2   2   5   65   1

In case anyone else comes across this using google as well. Since pandas version 0.23, you can pass the name of the level as an argument to sort_values :

test.sort_values(['src','sum'], ascending=[1,0])

Result:
         sum  count
src dst            
1   2     70      2
    5     10      1
2   8     90      2
    5     65      1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM