Sort all columns of a pandas DataFrame independently using sort_values()

Question

I have a dataframe and want to sort all columns independently in descending or ascending order.

import pandas as pd

data = {'a': [5, 2, 3, 6],
        'b': [7, 9, 1, 4],
        'c': [1, 5, 4, 2]}
df = pd.DataFrame.from_dict(data)
   a  b  c
0  5  7  1
1  2  9  5
2  3  1  4
3  6  4  2

When I use sort_values() for this it does not work as expected (to me) and only sorts one column:

foo = df.sort_values(by=['a', 'b', 'c'], ascending=[False, False, False])
   a  b  c
3  6  4  2
0  5  7  1
2  3  1  4
1  2  9  5

I can get the desired result if I use the solution from this answer which applies a lambda function:

bar = df.apply(lambda x: x.sort_values().values)
print(bar)

   a  b  c
0  2  1  1
1  3  4  2
2  5  7  4
3  6  9  5

But this looks a bit heavy-handed to me.

What's actually happening in the sort_values() example above and how can I sort all columns in my dataframe in a pandas-way without the lambda function?

Answer 1

You can use numpy.sort with DataFrame constructor:

df1 = pd.DataFrame(np.sort(df.values, axis=0), index=df.index, columns=df.columns)
print (df1)
   a  b  c
0  2  1  1
1  3  4  2
2  5  7  4
3  6  9  5

EDIT:

Answer with descending order:

arr = df.values
arr.sort(axis=0)
arr = arr[::-1]
print (arr)
[[6 9 5]
 [5 7 4]
 [3 4 2]
 [2 1 1]]

df1 = pd.DataFrame(arr, index=df.index, columns=df.columns)
print (df1)
   a  b  c
0  6  9  5
1  5  7  4
2  3  4  2
3  2  1  1

Answer 2

sort_values will sort the entire data frame by the columns order you pass to it. In your first example you are sorting the entire data frame with ['a', 'b', 'c'] . This will sort first by 'a' , then by 'b' and finally by 'c' .

Notice how, after sorting by a , the rows maintain the same. This is the expected result.

Using lambda you are passing each column to it, this means sort_values will apply to a single column, and that's why this second approach sorts the columns as you would expect. In this case, the rows change.

If you don't want to use lambda nor numpy you can get around using this:

pd.DataFrame({x: df[x].sort_values().values for x in df.columns.values})

Output:

Sort all columns of a pandas DataFrame independently using sort_values()

Question

2 answers

solution1
6 ACCPTED 2017-04-07 14:14:52

solution2
5 2017-04-07 14:17:22

Sort all columns of a pandas DataFrame independently using sort_values()

Question

2 answers

solution1 6 ACCPTED 2017-04-07 14:14:52

solution2 5 2017-04-07 14:17:22

solution1
6 ACCPTED 2017-04-07 14:14:52

solution2
5 2017-04-07 14:17:22