简体   繁体   English

在Pandas数据框中对分类数据进行排序时获得不必要的顺序

[英]Getting unwanted order when sorting Categorical data in a pandas dataframe

When sorting columns in a pandas dataframe that contain text (and thus have datatype 'object'), the df.sort syntax works, and sorts apple, orange, banana in the correct order. 在对包含文本(因此数据类型为“对象”)的df.sort数据df.sort列进行排序时, df.sort语法有效,并以正确的顺序对apple, orange, banana进行排序。 However if I convert the fruit column to Categorical data type then try and sort it doesn't work. 但是,如果我将Fruit列转换为Categorical数据类型,则尝试对其进行排序是行不通的。

I want to sort first by a datetime column, and then by a Categorical column, then by some numerical ones (float/int). 我想首先按datetime列排序,然后按“分类”列排序,然后按一些数值(浮点数/整数)排序​​。

Code (where account is not categorical) sorts by month_date which is datetime object and account (AZ) correctly: 代码(其中的account不是绝对的)按month_date正确排序, month_date是日期时间对象和account (AZ)

#data['month_name'] = pd.Categorical(data['month_name'],
#           categories=data.month_name.unique().tolist())
#data['account'] = pd.Categorical(data['account'],
#           categories=data.account.unique().tolist())

column_list = data.columns.values.tolist()
sorted_data = data.sort(["month_date","account"], ascending=True)
display(sorted_data)

Example: 例:

  • Apple 苹果
  • Banana 香蕉
  • Carrot 胡萝卜

Code (where account is Categorical) does not sort correctly (note pd.categorical data no longer commented out): 代码( account为“分类”)无法正确排序(请注意,pd.categorical数据不再被注释掉):

data['month_name'] = pd.Categorical(data['month_name'],
    categories=data.month_name.unique().tolist())
data['account'] = pd.Categorical(data['account'],
    categories=data.account.unique().tolist())
column_list = data.columns.values.tolist()
sorted_data = data.sort(["month_date","account"], ascending=True)
display(sorted_data)

Example

  • Apple 苹果
  • Carrot 胡萝卜
  • Banana 香蕉

Your categories are themselves not in a guaranteed order. 您的类别本身并没有保证的顺序。 unique does not guarantee any order. unique不保证任何订单。 They will be in the order listed (not clear what the values they have in your example) 它们将按照列出的顺序排列(不清楚您的示例中的值)

In [7]: df = DataFrame({'A' : pd.Categorical(list('bbeebbaa'),categories=['e','a','b']), 'B' : np.arange(8) })

In [8]: df
Out[8]: 
   A  B
0  b  0
1  b  1
2  e  2
3  e  3
4  b  4
5  b  5
6  a  6
7  a  7

In [9]: df.dtypes
Out[9]: 
A    category
B       int64
dtype: object

In [10]: df.sort(['A','B'])
Out[10]: 
   A  B
2  e  2
3  e  3
6  a  6
7  a  7
0  b  0
1  b  1
4  b  4
5  b  5

In [11]: df.sort(['A','B'],ascending=False)
Out[11]: 
   A  B
5  b  5
4  b  4
1  b  1
0  b  0
7  a  7
6  a  6
3  e  3
2  e  2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM