[英]sort_values() in Pandas behaves contrary to documentation
I am puzzled with the behavior of sort_values() in Pandas which does not seem to respond appropriately to the axis argument. 我对Pandas中sort_values()的行为感到困惑,该行为似乎无法正确响应axis参数。
For a toy example: 对于玩具示例:
toy.to_json()
'{"labels":{"0":7,"1":4,"2":7,"3":1,"4":5,"5":0,"6":3,"7":1,"8":4,"9":9},"companies":{"0":"Apple","1":"AIG","2":"Amazon","3":"American express","4":"Boeing","5":"Bank of America","6":"British American Tobacco","7":"Canon","8":"Caterpillar","9":"Colgate-Palmolive"}}'
toy.sort_values('labels') # this works alright
labels companies
5 0 Bank of America
3 1 American express
7 1 Canon
6 3 British American Tobacco
1 4 AIG
8 4 Caterpillar
4 5 Boeing
0 7 Apple
2 7 Amazon
9 9 Colgate-Palmolive
toy.sort_values(by = 'labels', axis = 1) # Returns an exception
KeyError: 'labels'
这是因为在示例中,轴0为“下”,轴为“右”(即跨列)。如果查看sort_values的文档,则会看到第一个参数的确是by
,而默认值是axis
为0。因此,重复您的第一个示例,您需要执行toy.sort_values(by='labels', axis=0)
Adding on an example to the above comments and answers: 在上面的评论和答案上添加一个示例:
Lets assume you had a dataframe as below: 假设您有一个数据框,如下所示:
df = pd.DataFrame(data={"labels":{"0":7,"1":4,"2":7,"3":1,"4":5},"companies":{"0":9,"1":1,"2":6,"3":1,"4":8}})
>>df
labels companies
0 7 9
1 4 1
2 7 6
3 1 1
4 5 8
For axis=0
, it would sort when you pass a index levels and/or column labels as: 对于
axis=0
,当您将索引级别和/或列标签传递为时,它将进行排序:
df.sort_values(by='labels')
which gives you a sorted label
column (ascending by default). 它为您提供了一个排序的
label
列(默认情况下升序)。
labels companies
3 1 1
1 4 1
4 5 8
0 7 9
2 7 6
Coming to axis=1
, refer to the below code: 来到
axis=1
,请参考以下代码:
df.sort_values('4',axis=1)
This will sort the columns in a way the index 4
is sorted. 这将以对
index 4
进行排序的方式对列进行排序。 Here it wont change anything since for index 4
since 5
is less than 8
and by default the sorting is ascending
. 在这里它不会改变任何东西,因为对于
index 4
因为5
小于8
并且默认情况下排序是ascending
。 However if you execute df.sort_values('1',axis=1)
where the value under label
is more than companies
, you will see that the position of labels
and companies
has been exchanged. 但是,如果你执行
df.sort_values('1',axis=1)
其中,下的值label
不止companies
,你会看到的位置labels
和companies
已被更换。
companies labels
0 9 7
1 1 4
2 6 7
3 1 1
4 8 5
Hope this clarifies. 希望这可以澄清。
Just to get understanding around axis and rows to clear when we choose axis=1
or axis=0
. 只是为了了解轴和行,以便在选择
axis=1
或axis=0
时清除它们。
df.shape[0] # gives number of row count
df.shape[1] # gives number of col count
Let's assume a dataFrame as follow: 让我们假设一个dataFrame如下:
>>> df = pd.DataFrame({
... 'col1' : ['A', 'A', 'B', np.nan, 'D', 'C'],
... 'col2' : [2, 1, 9, 8, 7, 4],
... 'col3': [0, 1, 9, 4, 2, 3],
... })
>>> df
col1 col2 col3
0 A 2 0
1 A 1 1
2 B 9 9
3 NaN 8 4
4 D 7 2
5 C 4 3
So, applying the df.shape and see how it turns around the columns & rows: 因此,应用df.shape并查看它如何绕过列和行:
>>> df.shape[0]
6 <-- Here, we have six row into the dataFrame
>>> df.shape[1]
3 <-- Here, we have three columns into the dataFrame
Now if you are just sorting the value by column name hence you don't need to specify axis=1
because column name already been specified, you can do simply : 现在,如果您只是按列名对值进行排序,那么由于已经指定了列名,因此无需指定
axis=1
,则可以简单地执行以下操作:
>>> df.sort_values(by=['col1'])
col1 col2 col3
0 A 2 0
1 A 1 1
2 B 9 9
5 C 4 3
4 D 7 2
3 NaN 8 4
or, you can pass multiple column names as a list with by
: 或者,您可以通过
by
将多个列名作为列表传递:
>>> df.sort_values(by=['col1', 'col2'])
col1 col2 col3
1 A 1 1
0 A 2 0
2 B 9 9
5 C 4 3
4 D 7 2
3 NaN 8 4
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.