[英]how to sort pandas dataframe from one column
I have a data frame like this:我有一个这样的数据框:
print(df)
0 1 2
0 354.7 April 4.0
1 55.4 August 8.0
2 176.5 December 12.0
3 95.5 February 2.0
4 85.6 January 1.0
5 152 July 7.0
6 238.7 June 6.0
7 104.8 March 3.0
8 283.5 May 5.0
9 278.8 November 11.0
10 249.6 October 10.0
11 212.7 September 9.0
As you can see, months are not in calendar order.如您所见,月份不是按日历顺序排列的。 So I created a second column to get the month number corresponding to each month (1-12).
所以我创建了第二列来获取与每个月 (1-12) 对应的月份编号。 From there, how can I sort this data frame according to calendar months' order?
从那里,我如何根据日历月的顺序对这个数据框进行排序?
Use sort_values
to sort the df by a specific column's values:使用
sort_values
按特定列的值对 df 进行排序:
In [18]:
df.sort_values('2')
Out[18]:
0 1 2
4 85.6 January 1.0
3 95.5 February 2.0
7 104.8 March 3.0
0 354.7 April 4.0
8 283.5 May 5.0
6 238.7 June 6.0
5 152.0 July 7.0
1 55.4 August 8.0
11 212.7 September 9.0
10 249.6 October 10.0
9 278.8 November 11.0
2 176.5 December 12.0
If you want to sort by two columns, pass a list of column labels to sort_values
with the column labels ordered according to sort priority.如果要按两列排序,请将列标签列表传递给
sort_values
,列标签根据排序优先级排序。 If you use df.sort_values(['2', '0'])
, the result would be sorted by column 2
then column 0
.如果您使用
df.sort_values(['2', '0'])
,则结果将按2
列然后按第0
列排序。 Granted, this does not really make sense for this example because each value in df['2']
is unique.诚然,这对于这个例子来说并没有什么意义,因为
df['2']
中的每个值都是唯一的。
I tried the solutions above and I do not achieve results, so I found a different solution that works for me.我尝试了上面的解决方案,但没有取得结果,所以我找到了一个适合我的不同解决方案。 The
ascending=False
is to order the dataframe in descending order, by default it is True
. ascending=False
是对 dataframe 进行降序排序,默认为True
。 I am using python 3.6.6 and pandas 0.23.4 versions.我正在使用 python 3.6.6 和 pandas 0.23.4 版本。
final_df = df.sort_values(by=['2'], ascending=False)
You can see more details in pandas documentation here .您可以在此处的 pandas 文档中查看更多详细信息。
Using column name worked for me.使用列名对我有用。
sorted_df = df.sort_values(by=['Column_name'], ascending=True)
Just as another solution:就像另一个解决方案一样:
Instead of creating the second column , you can categorize your string data(month name) and sort by that like this:您可以对字符串数据(月份名称)进行分类并按如下方式排序,而不是创建第二列:
df.rename(columns={1:'month'},inplace=True)
df['month'] = pd.Categorical(df['month'],categories=['December','November','October','September','August','July','June','May','April','March','February','January'],ordered=True)
df = df.sort_values('month',ascending=False)
It will give you the ordered data by month name
as you specified while creating the Categorical
object.它将按照您在创建
Categorical
object 时指定的month name
为您提供排序数据。
Panda's sort_values
does the work. Panda 的
sort_values
可以完成这项工作。
If one intends to keep the same variable name, don't forget the inplace=True
(this performs the operation in-place)如果打算保持相同的变量名,请不要忘记
inplace=True
(这会就地执行操作)
df.sort_values(by=['2'], inplace=True)
One might as well assign the change (sort) to a variable, that may have the same name, such as the df
as不妨将更改(排序)分配给一个变量,该变量可能具有相同的名称,例如
df
为
df = df.sort_values(by=['2'])
Forgetting the steps mentioned above may lead one (as this user ) to not be able to get the expected result.忘记上述步骤可能会导致(作为此用户)无法获得预期的结果。
Note that if one wants in descending order, one needs to pass ascending=False
, such as请注意,如果要按降序排列,则需要传递
ascending=False
,例如
df = df.sort_values(by=['2'], ascending=False)
Just adding some more operations on data.只是在数据上添加更多操作。 Suppose we have a dataframe
df
, we can do several operations to get desired outputs假设我们有一个 dataframe
df
,我们可以做几个操作来获得想要的输出
ID cost tax label
1 216590 1600 test
2 523213 1800 test
3 250 1500 experiment
(df['label'].value_counts().to_frame().reset_index()).sort_values('label', ascending=False)
will give sorted
output of labels as a dataframe
将
sorted
output 标签作为dataframe
index label
0 test 2
1 experiment 1
This worked for me这对我有用
df.sort_values(by='Column_name', inplace=True, ascending=False)
You probably need to reset the index after sorting:您可能需要在排序后重置索引:
df = df.sort_values('2')
df = df.reset_index(drop=True)
Here is template of sort_values according to pandas documentation.这是根据 pandas 文档的 sort_values 模板。
DataFrame.sort_values(by, axis=0,
ascending=True,
inplace=False,
kind='quicksort',
na_position='last',
ignore_index=False, key=None)[source]
In this case it will be like this.在这种情况下,它将是这样的。
df.sort_values(by=['2'])
API Reference pandas.DataFrame.sort_values API 参考pandas.DataFrame.sort_values
If you want to sort column dynamically but not alphabetically.如果您想对列进行动态排序而不是按字母顺序排序。 and dont want to use pd.sort_values().
并且不想使用 pd.sort_values()。 you can try below solution.
您可以尝试以下解决方案。
Problem: sort column "col1" in this sequence ['A', 'C', 'D', 'B']问题:按此序列 ['A', 'C', 'D', 'B'] 对列“col1”进行排序
import pandas as pd
import numpy as np
## Sample DataFrame ##
df = pd.DataFrame({'col1': ['A', 'B', 'D', 'C', 'A']})
>>> df
col1
0 A
1 B
2 D
3 C
4 A
## Solution ##
conditions = []
values = []
for i,j in enumerate(['A','C','D','B']):
conditions.append((df['col1'] == j))
values.append(i)
df['col1_Num'] = np.select(conditions, values)
df.sort_values(by='col1_Num',inplace = True)
>>> df
col1 col1_Num
0 A 0
4 A 0
3 C 1
2 D 2
1 B 3
Just adding a few more insights只需添加更多见解
df=raw_df['2'].sort_values() # will sort only one column (i.e 2)
but,但,
df =raw_df.sort_values(by=["2"] , ascending = False) # this will sort the whole df in decending order on the basis of the column "2"
This one worked for me:这个对我有用:
df=df.sort_values(by=[2])
Whereas:然而:
df=df.sort_values(by=['2'])
is not working.不管用。
Example: Assume you have a column with values 1 and 0 and you want to separate and use only one value, then:示例:假设您有一个值为 1 和 0 的列,并且您想分离并只使用一个值,那么:
// furniture is one of the columns in the csv file.
allrooms = data.groupby('furniture')['furniture'].agg('count')
allrooms
myrooms1 = pan.DataFrame(allrooms, columns = ['furniture'], index = [1])
myrooms2 = pan.DataFrame(allrooms, columns = ['furniture'], index = [0])
print(myrooms1);print(myrooms2)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.