Sample data:
mdf = pd.DataFrame([[1,2,50],[1,2,20],
[1,5,10],[2,8,80],
[2,5,65],[2,8,10]
], columns=['src','dst','n']); mdf
src dst n
0 1 2 50
1 1 2 20
2 1 5 10
3 2 8 80
4 2 5 65
5 2 8 10
groupby()
gives a two-level multi-index:
test = mdf.groupby(['src','dst'])['n'].agg(['sum','count']); test
sum count
src dst
1 2 70 2
5 10 1
2 5 65 1
8 90 2
Question: how to sort this DataFrame by src
ascending and then by sum
descending?
I'm a beginner with pandas, learned about sort_index() and sort_values(), but in this task it seems that I need both simultaneously.
Expected result , under each "src" sorting is determined by the "sum":
sum count
src dst
1 2 70 2
5 10 1
2 8 90 2
5 65 1
IIUC:
In [29]: test.sort_values('sum', ascending=False).sort_index(level=0)
Out[29]:
sum count
src dst
1 2 80 2
5 10 1
2 8 80 1
UPDATE: very similar to @anonyXmous's solution :
In [47]: (test.reset_index()
.sort_values(['src','sum'], ascending=[1,0])
.set_index(['src','dst']))
Out[47]:
sum count
src dst
1 2 70 2
5 10 1
2 8 90 2
5 65 1
You can reset the index then sort them by chosen columns. Hope this helps.
import pandas as pd
mdf = pd.DataFrame([[1,2,50],[1,2,20],
[1,5,10],[2,8,80],
[2,5,65],[2,8,10]
], columns=['src','dst','n']);
mdf = mdf.groupby(['src','dst'])['n'].agg(['sum','count']);
mdf.reset_index(inplace=True)
mdf.sort_values(['src', 'sum'], ascending=[True, False], inplace=True)
print(mdf)
Result:
src dst sum count
0 1 2 70 2
1 1 5 10 1
3 2 8 90 2
2 2 5 65 1
In case anyone else comes across this using google as well. Since pandas version 0.23, you can pass the name of the level as an argument to sort_values
:
test.sort_values(['src','sum'], ascending=[1,0])
Result:
sum count
src dst
1 2 70 2
5 10 1
2 8 90 2
5 65 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.