简体   繁体   中英

Sort all columns of a pandas DataFrame independently using sort_values()

I have a dataframe and want to sort all columns independently in descending or ascending order.

import pandas as pd

data = {'a': [5, 2, 3, 6],
        'b': [7, 9, 1, 4],
        'c': [1, 5, 4, 2]}
df = pd.DataFrame.from_dict(data)
   a  b  c
0  5  7  1
1  2  9  5
2  3  1  4
3  6  4  2

When I use sort_values() for this it does not work as expected (to me) and only sorts one column:

foo = df.sort_values(by=['a', 'b', 'c'], ascending=[False, False, False])
   a  b  c
3  6  4  2
0  5  7  1
2  3  1  4
1  2  9  5

I can get the desired result if I use the solution from this answer which applies a lambda function:

bar = df.apply(lambda x: x.sort_values().values)
print(bar)

   a  b  c
0  2  1  1
1  3  4  2
2  5  7  4
3  6  9  5

But this looks a bit heavy-handed to me.

What's actually happening in the sort_values() example above and how can I sort all columns in my dataframe in a pandas-way without the lambda function?

You can use numpy.sort with DataFrame constructor:

df1 = pd.DataFrame(np.sort(df.values, axis=0), index=df.index, columns=df.columns)
print (df1)
   a  b  c
0  2  1  1
1  3  4  2
2  5  7  4
3  6  9  5

EDIT:

Answer with descending order:

arr = df.values
arr.sort(axis=0)
arr = arr[::-1]
print (arr)
[[6 9 5]
 [5 7 4]
 [3 4 2]
 [2 1 1]]

df1 = pd.DataFrame(arr, index=df.index, columns=df.columns)
print (df1)
   a  b  c
0  6  9  5
1  5  7  4
2  3  4  2
3  2  1  1

sort_values will sort the entire data frame by the columns order you pass to it. In your first example you are sorting the entire data frame with ['a', 'b', 'c'] . This will sort first by 'a' , then by 'b' and finally by 'c' .

Notice how, after sorting by a , the rows maintain the same. This is the expected result.

Using lambda you are passing each column to it, this means sort_values will apply to a single column, and that's why this second approach sorts the columns as you would expect. In this case, the rows change.

If you don't want to use lambda nor numpy you can get around using this:

pd.DataFrame({x: df[x].sort_values().values for x in df.columns.values})

Output:

   a  b  c
0  2  1  1
1  3  4  2
2  5  7  4
3  6  9  5

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM