简体   繁体   中英

Comparison between ordered categorical type in Pandas not working as expected

The following code:

s2 = pd.Series(['m','l','s','xl','xs'])

size_type = pd.api.types.CategoricalDtype(categories =['xs','s','m','l','xl'], ordered = True)

s3 = s2.astype(size_type)

print(s3)

Yelds this result:

0     m
1     l
2     s
3    xl
4    xs
dtype: category
Categories (5, object): ['xs' < 's' < 'm' < 'l' < 'xl']

So I expect that the "m" type would be bigger than the "s" type, acoording to the order that I set when I created the category. But when I check this in a comparison, the result is the opposite:

s3[0] > s3[2]

Yelds this result:

False

Why is this happening?

s3[0] and s3[2] return strings, which are not ordered by category code, you can use .cat.codes to access the internally stored code for comparison:

s3.cat.codes[0] > s3.cat.codes[2]
# True

To see .cat.codes in detail:

s3.cat.codes
#0    2
#1    3
#2    1
#3    4
#4    0
#dtype: int8

s3.cat.codes[0]
#2

s3.cat.codes[2]
#1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM