如何从一列中排序 pandas dataframe

Question

I have a data frame like this:我有一个这样的数据框：

print(df)

        0          1     2
0   354.7      April   4.0
1    55.4     August   8.0
2   176.5   December  12.0
3    95.5   February   2.0
4    85.6    January   1.0
5     152       July   7.0
6   238.7       June   6.0
7   104.8      March   3.0
8   283.5        May   5.0
9   278.8   November  11.0
10  249.6    October  10.0
11  212.7  September   9.0

As you can see, months are not in calendar order.如您所见，月份不是按日历顺序排列的。 So I created a second column to get the month number corresponding to each month (1-12).所以我创建了第二列来获取与每个月 (1-12) 对应的月份编号。 From there, how can I sort this data frame according to calendar months' order?从那里，我如何根据日历月的顺序对这个数据框进行排序？

Answer 1

Use sort_values to sort the df by a specific column's values:使用sort_values按特定列的值对 df 进行排序：

In [18]:
df.sort_values('2')

Out[18]:
        0          1     2
4    85.6    January   1.0
3    95.5   February   2.0
7   104.8      March   3.0
0   354.7      April   4.0
8   283.5        May   5.0
6   238.7       June   6.0
5   152.0       July   7.0
1    55.4     August   8.0
11  212.7  September   9.0
10  249.6    October  10.0
9   278.8   November  11.0
2   176.5   December  12.0

If you want to sort by two columns, pass a list of column labels to sort_values with the column labels ordered according to sort priority.如果要按两列排序，请将列标签列表传递给sort_values ，列标签根据排序优先级排序。 If you use df.sort_values(['2', '0']) , the result would be sorted by column 2 then column 0 .如果您使用df.sort_values(['2', '0']) ，则结果将按2列然后按第0列排序。 Granted, this does not really make sense for this example because each value in df['2'] is unique.诚然，这对于这个例子来说并没有什么意义，因为df['2']中的每个值都是唯一的。

Answer 2

I tried the solutions above and I do not achieve results, so I found a different solution that works for me.我尝试了上面的解决方案，但没有取得结果，所以我找到了一个适合我的不同解决方案。 The ascending=False is to order the dataframe in descending order, by default it is True . ascending=False是对 dataframe 进行降序排序，默认为True 。 I am using python 3.6.6 and pandas 0.23.4 versions.我正在使用 python 3.6.6 和 pandas 0.23.4 版本。

final_df = df.sort_values(by=['2'], ascending=False)

You can see more details in pandas documentation here .您可以在此处的 pandas 文档中查看更多详细信息。

Answer 3

Using column name worked for me.使用列名对我有用。

sorted_df = df.sort_values(by=['Column_name'], ascending=True)

Answer 4

Just as another solution:就像另一个解决方案一样：

Instead of creating the second column , you can categorize your string data(month name) and sort by that like this:您可以对字符串数据（月份名称）进行分类并按如下方式排序，而不是创建第二列：

df.rename(columns={1:'month'},inplace=True)
df['month'] = pd.Categorical(df['month'],categories=['December','November','October','September','August','July','June','May','April','March','February','January'],ordered=True)
df = df.sort_values('month',ascending=False)

It will give you the ordered data by month name as you specified while creating the Categorical object.它将按照您在创建Categorical object 时指定的month name为您提供排序数据。

Answer 5

Panda's sort_values does the work. Panda 的sort_values可以完成这项工作。

If one intends to keep the same variable name, don't forget the inplace=True (this performs the operation in-place)如果打算保持相同的变量名，请不要忘记inplace=True （这会就地执行操作）

df.sort_values(by=['2'], inplace=True)

One might as well assign the change (sort) to a variable, that may have the same name, such as the df as不妨将更改（排序）分配给一个变量，该变量可能具有相同的名称，例如df为

df = df.sort_values(by=['2'])

Forgetting the steps mentioned above may lead one (as this user ) to not be able to get the expected result.忘记上述步骤可能会导致（作为此用户）无法获得预期的结果。

Note that if one wants in descending order, one needs to pass ascending=False , such as请注意，如果要按降序排列，则需要传递ascending=False ，例如

df = df.sort_values(by=['2'], ascending=False)

Answer 6

Just adding some more operations on data.只是在数据上添加更多操作。 Suppose we have a dataframe df , we can do several operations to get desired outputs假设我们有一个 dataframe df ，我们可以做几个操作来获得想要的输出

ID         cost      tax    label
1       216590      1600    test      
2       523213      1800    test 
3          250      1500    experiment

(df['label'].value_counts().to_frame().reset_index()).sort_values('label', ascending=False)

will give sorted output of labels as a dataframe将sorted output 标签作为dataframe

    index   label
0   test        2
1   experiment  1

Answer 7

This worked for me这对我有用

df.sort_values(by='Column_name', inplace=True, ascending=False)

Answer 8

You probably need to reset the index after sorting:您可能需要在排序后重置索引：

df = df.sort_values('2')
df = df.reset_index(drop=True)

Answer 9

Here is template of sort_values according to pandas documentation.这是根据 pandas 文档的 sort_values 模板。

DataFrame.sort_values(by, axis=0,
                          ascending=True,
                          inplace=False,
                          kind='quicksort',
                          na_position='last',
                          ignore_index=False, key=None)[source]

In this case it will be like this.在这种情况下，它将是这样的。

df.sort_values(by=['2'])

API Reference pandas.DataFrame.sort_values API 参考pandas.DataFrame.sort_values

Answer 10

If you want to sort column dynamically but not alphabetically.如果您想对列进行动态排序而不是按字母顺序排序。 and dont want to use pd.sort_values().并且不想使用 pd.sort_values()。 you can try below solution.您可以尝试以下解决方案。

Problem: sort column "col1" in this sequence ['A', 'C', 'D', 'B']问题：按此序列 ['A', 'C', 'D', 'B'] 对列“col1”进行排序

import pandas as pd
import numpy as np

## Sample DataFrame ##
df = pd.DataFrame({'col1': ['A', 'B', 'D', 'C', 'A']})

>>> df
   col1
0    A
1    B
2    D
3    C
4    A
## Solution ##

conditions = []
values = []

for i,j in enumerate(['A','C','D','B']):
    conditions.append((df['col1'] == j))
    values.append(i)

df['col1_Num'] = np.select(conditions, values)

df.sort_values(by='col1_Num',inplace = True)

>>> df

    col1  col1_Num
0    A         0
4    A         0
3    C         1
2    D         2
1    B         3

Answer 11

Just adding a few more insights只需添加更多见解

df=raw_df['2'].sort_values() # will sort only one column (i.e 2)

but,但，

df =raw_df.sort_values(by=["2"] , ascending = False)  # this  will sort the whole df in decending order on the basis of the column "2"

Answer 12

This one worked for me:这个对我有用：

df=df.sort_values(by=[2])

Whereas:然而：

df=df.sort_values(by=['2'])

is not working.不管用。

Answer 13

Example: Assume you have a column with values 1 and 0 and you want to separate and use only one value, then:示例：假设您有一个值为 1 和 0 的列，并且您想分离并只使用一个值，那么：

// furniture is one of the columns in the csv file.
 

allrooms = data.groupby('furniture')['furniture'].agg('count')
allrooms


myrooms1 = pan.DataFrame(allrooms, columns = ['furniture'], index = [1])

myrooms2 = pan.DataFrame(allrooms, columns = ['furniture'], index = [0])

print(myrooms1);print(myrooms2)

如何从一列中排序 pandas dataframe

问题描述

13 个解决方案

解决方案1
642 已采纳 2016-06-13 10:45:15

解决方案2
222 2018-11-14 14:42:16

解决方案3
47 2020-08-27 09:57:38

解决方案4
21 2019-06-30 05:34:03

解决方案5
20 2021-02-05 14:10:55

解决方案6
11 2018-07-17 16:19:57

解决方案7
5 2020-12-18 04:16:00

解决方案8
4 2021-12-26 06:47:31

解决方案9
3 2020-08-10 12:20:45

解决方案10
2 2022-11-24 14:44:28

解决方案11
1 2022-07-03 08:08:42

解决方案12
0 2021-02-15 07:55:47

解决方案13
-1 2021-07-18 16:20:18

如何从一列中排序 pandas dataframe

问题描述

13 个解决方案

解决方案1 642 已采纳 2016-06-13 10:45:15

解决方案2 222 2018-11-14 14:42:16

解决方案3 47 2020-08-27 09:57:38

解决方案4 21 2019-06-30 05:34:03

解决方案5 20 2021-02-05 14:10:55

解决方案6 11 2018-07-17 16:19:57

解决方案7 5 2020-12-18 04:16:00

解决方案8 4 2021-12-26 06:47:31

解决方案9 3 2020-08-10 12:20:45

解决方案10 2 2022-11-24 14:44:28

解决方案11 1 2022-07-03 08:08:42

解决方案12 0 2021-02-15 07:55:47

解决方案13 -1 2021-07-18 16:20:18

解决方案1
642 已采纳 2016-06-13 10:45:15

解决方案2
222 2018-11-14 14:42:16

解决方案3
47 2020-08-27 09:57:38

解决方案4
21 2019-06-30 05:34:03

解决方案5
20 2021-02-05 14:10:55

解决方案6
11 2018-07-17 16:19:57

解决方案7
5 2020-12-18 04:16:00

解决方案8
4 2021-12-26 06:47:31

解决方案9
3 2020-08-10 12:20:45

解决方案10
2 2022-11-24 14:44:28

解决方案11
1 2022-07-03 08:08:42

解决方案12
0 2021-02-15 07:55:47

解决方案13
-1 2021-07-18 16:20:18