繁体   English   中英

为什么我不能从由另一个列表中的元素过滤的列表中选择某些元素?

[英]Why am I not able to select some elements from a list filtered by elements in another list?

我有一个数据集,其中有3列,分别称为CLASSDURATIONGENDER

import pandas as pd
data = pd.read_csv('dataset.csv')
CLASS = data['CLASS']
DURATION = data['DURATION']
GENDER = data['GENDER']

CLASS包含5种类型的条目- blank, 1, 2, 3, 4 ; DURATION包含-1 (表示某些语义值)或一些正整数; GENDER包含MF 我能够选择的条目CLASSGENDER ,像这样:

CLASS[GENDER=='M']

但是我不能够在选择条目OCCUP_CLASS的持续时间-1所示:

CLASS[DURATION=='-1']

这是为什么? 这是我得到的错误:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-56-604aed5ebca4> in <module>()
----> 1 CLASS[DURATION=='-1']

c:\users\h473\appdata\local\programs\python\python35\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
    621         key = com._apply_if_callable(key, self)
    622         try:
--> 623             result = self.index.get_value(self, key)
    624 
    625             if not is_scalar(result):

c:\users\h473\appdata\local\programs\python\python35\lib\site-packages\pandas\core\indexes\base.py in get_value(self, series, key)
   2558         try:
   2559             return self._engine.get_value(s, k,
-> 2560                                           tz=getattr(series.dtype, 'tz', None))
   2561         except KeyError as e1:
   2562             if len(self) > 0 and self.inferred_type in ['integer', 'boolean']:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()

pandas\_libs\index_class_helper.pxi in pandas._libs.index.Int64Engine._check_type()

pandas\_libs\index_class_helper.pxi in pandas._libs.index.Int64Engine._check_type()

KeyError: False

我无法复制,但您可以尝试

import pandas as pd
data = pd.read_csv('dataset.csv')
CLASS = data['CLASS']
DURATION = data['DURATION']
GENDER = data['GENDER']

# fill the nan value 
DURATION.fillna(0,inplace=True)
# using astype convert the value to int then compare
CLASS[DURATION].astype(int)>0

也许最好不要将它们拆分为Series开头,而应该在Dataframe上尝试:

import pandas as pd
data = pd.read_csv('dataset.csv')
data.loc[data['GENDER'] == 'M', 'CLASS']
data.loc[data['DURATION'] == -1, 'CLASS']

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM