[英]Getting unwanted order when sorting Categorical data in a pandas dataframe
When sorting columns in a pandas dataframe that contain text (and thus have datatype 'object'), the df.sort
syntax works, and sorts apple, orange, banana
in the correct order. 在对包含文本(因此数据类型为“对象”)的
df.sort
数据df.sort
列进行排序时, df.sort
语法有效,并以正确的顺序对apple, orange, banana
进行排序。 However if I convert the fruit column to Categorical
data type then try and sort it doesn't work. 但是,如果我将Fruit列转换为
Categorical
数据类型,则尝试对其进行排序是行不通的。
I want to sort first by a datetime column, and then by a Categorical column, then by some numerical ones (float/int). 我想首先按datetime列排序,然后按“分类”列排序,然后按一些数值(浮点数/整数)排序。
Code (where account
is not categorical) sorts by month_date
which is datetime object and account (AZ)
correctly: 代码(其中的
account
不是绝对的)按month_date
正确排序, month_date
是日期时间对象和account (AZ)
:
#data['month_name'] = pd.Categorical(data['month_name'],
# categories=data.month_name.unique().tolist())
#data['account'] = pd.Categorical(data['account'],
# categories=data.account.unique().tolist())
column_list = data.columns.values.tolist()
sorted_data = data.sort(["month_date","account"], ascending=True)
display(sorted_data)
Example: 例:
Code (where account
is Categorical) does not sort correctly (note pd.categorical data no longer commented out): 代码(
account
为“分类”)无法正确排序(请注意,pd.categorical数据不再被注释掉):
data['month_name'] = pd.Categorical(data['month_name'],
categories=data.month_name.unique().tolist())
data['account'] = pd.Categorical(data['account'],
categories=data.account.unique().tolist())
column_list = data.columns.values.tolist()
sorted_data = data.sort(["month_date","account"], ascending=True)
display(sorted_data)
Example 例
Your categories are themselves not in a guaranteed order. 您的类别本身并没有保证的顺序。
unique
does not guarantee any order. unique
不保证任何订单。 They will be in the order listed (not clear what the values they have in your example) 它们将按照列出的顺序排列(不清楚您的示例中的值)
In [7]: df = DataFrame({'A' : pd.Categorical(list('bbeebbaa'),categories=['e','a','b']), 'B' : np.arange(8) })
In [8]: df
Out[8]:
A B
0 b 0
1 b 1
2 e 2
3 e 3
4 b 4
5 b 5
6 a 6
7 a 7
In [9]: df.dtypes
Out[9]:
A category
B int64
dtype: object
In [10]: df.sort(['A','B'])
Out[10]:
A B
2 e 2
3 e 3
6 a 6
7 a 7
0 b 0
1 b 1
4 b 4
5 b 5
In [11]: df.sort(['A','B'],ascending=False)
Out[11]:
A B
5 b 5
4 b 4
1 b 1
0 b 0
7 a 7
6 a 6
3 e 3
2 e 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.