简体   繁体   中英

How to sort/ group a Pandas data frame by class label or any specific column

class col2 col3 col4 col5
1     4    5    5    5
4     4    4.5  5.5  6
1     3.5  5    6    4.5
3     3    4    4    4
2     3    3.5  3.8  6.1

I have used hypothetical data in the example. The shape of the real DataFrame is 6680x1900. I have clustered these data into 50 labeled classes (1 to 50). How can I sort this data in ascending order of class labels?

I have tried:

df.groupby([column_name_lst])["class"]

But it fails with this error:

TypeError: You have to supply one of 'by' and 'level'

How to solve this problem? Expected output is:

class col2 col3 col4 col5
1     4    5    5    5
1     3.5  5    6    4.5
2     3    3.5  3.8  6.1
3     3    4    4    4
4     4    4.5  5.5  6

I think you can use DataFrame.sort_values if class is Series :

print (type(df['class']))
<class 'pandas.core.series.Series'>


print (df.sort_values(by='class'))
   class  col2  col3  col4  col5
0      1   4.0   5.0   5.0   5.0
2      1   3.5   5.0   6.0   4.5
4      2   3.0   3.5   3.8   6.1
3      3   3.0   4.0   4.0   4.0
1      4   4.0   4.5   5.5   6.0

Also if need groupby , use parameter by :

print (df.groupby(by='class').sum())
       col2  col3  col4  col5
class                        
1       7.5  10.0  11.0   9.5
2       3.0   3.5   3.8   6.1
3       3.0   4.0   4.0   4.0
4       4.0   4.5   5.5   6.0

And if class is index , use Kartik solution :

print (df.index)
Int64Index([1, 4, 1, 3, 2], dtype='int64', name='class')

print (df.sort_index())
       col2  col3  col4  col5
class                        
1       4.0   5.0   5.0   5.0
1       3.5   5.0   6.0   4.5
2       3.0   3.5   3.8   6.1
3       3.0   4.0   4.0   4.0
4       4.0   4.5   5.5   6.0

Also if need groupby , use parameter level :

print (df.groupby(level='class').sum())
       col2  col3  col4  col5
class                        
1       7.5  10.0  11.0   9.5
2       3.0   3.5   3.8   6.1
3       3.0   4.0   4.0   4.0
4       4.0   4.5   5.5   6.0

or index , but first solution is better, because is more general:

print (df.groupby(df.index).sum())
       col2  col3  col4  col5
class                        
1       7.5  10.0  11.0   9.5
2       3.0   3.5   3.8   6.1
3       3.0   4.0   4.0   4.0
4       4.0   4.5   5.5   6.0

If you are starting with the data in your question:

 class col2 col3 col4 col5 1 4 5 5 5 4 4 4.5 5.5 6 1 3.5 5 6 4.5 3 3 4 4 4 2 3 3.5 3.8 6.1 

And want to sort that, then it depends on whether 'class' is an index or column. If index:

df.sort_index()

should give you the answer. If column, follow answer by @jezarael

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM