简体   繁体   English

无法按分类列过滤熊猫数据框

[英]Fail to filter pandas dataframe by categorical column

pandas 0.16.1 I converted all columns in dataframe to categoricals so it takes MUCH less space when dumped to disk. pandas 0.16.1我将数据帧中的所有列都转换为分类,因此转储到磁盘时占用的空间更少。 Now i want to filter dataframe. 现在我想过滤数据框。 It's ok with == and .isin but fails on <, <=, etc. operations with "Unordered Categoricals can only compare equality or not" 可以使用==和.isin,但在<,<=等上失败。使用“无序分类只能比较相等或不相等”的操作

data[data["MONTH COLUMN"]<=3]

If i comment out the following lines in categorical.py everything works fine. 如果我在categorical.py中注释掉以下几行,则一切正常。 Is it a bug in pandas? 这是大熊猫的虫子吗?

if not self.ordered:
    if op in ['__lt__', '__gt__','__le__','__ge__']:
        raise TypeError("Unordered Categoricals can only compare equality or not")

I think it was a good idea to use Categorical datatype on column which has only 12 unique values in ~1'400'000 rows.) 我认为在列上使用分类数据类型是个好主意,该列在〜1'400'000行中只有12个唯一值。)

The documentation states: 文档指出:

Note New categorical data are NOT automatically ordered. 注意新的分类数据不会自动排序。 You must explicity pass ordered=True to indicate an ordered Categorical. 您必须显式传递ordered = True,以指示有序的分类。

When you first create a category you want to be ordered, just specify this: 首次创建要订购的类别时,只需指定以下内容:

In [1]: import pandas as pd

In [3]: s = pd.Series(["a","b","c","a"]).astype('category', ordered=True)

In [5]: s
Out[5]: 
0    a
1    b
2    c
3    a
dtype: category
Categories (3, object): [a < b < c]

In [4]: s > 'a'
Out[4]: 
0    False
1     True
2     True
3    False
dtype: bool

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM