简体   繁体   English

过滤数据帧字符串列-类型'int'的参数不可迭代/不能使用包含NA / NaN值的向量进行索引

[英]Filtering a dataframe string column - argument of type 'int' is not iterable / cannot index with vector containing NA / NaN values

I'm working with the dataset online retail . 我正在使用数据集在线零售

There is a column called InvoiceNo which represents the invoice code. 有一个名为InvoiceNo的列,代表发票代码。 If this code starts with letter 'c', it indicates a cancellation. 如果此代码以字母“ c”开头,则表示已取消。

I want to groupby InvoiceNo for the instances where InvoiceNo contains 'C' . 我想针对InvoiceNo包含'C'的实例对InvoiceNo进行分组

import pandas as pd
import numpy as np    
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/00352/Online%20Retail.xlsx'
    retail_df = pd.read_excel(url)
    temp_df = retail_df[retail_df['InvoiceNo'].str.contains('c')]

I got an error: 我收到一个错误:

ValueError                                Traceback (most recent call last)
<ipython-input-29-e1f6cb12695b> in <module>()
----> 1 temp_df = retail_df[retail_df['InvoiceNo'].str.contains('c')]

~/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py in __getitem__(self, key)
   1956         if isinstance(key, (Series, np.ndarray, Index, list)):
   1957             # either boolean or fancy integer index
-> 1958             return self._getitem_array(key)
   1959         elif isinstance(key, DataFrame):
   1960             return self._getitem_frame(key)

~/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py in _getitem_array(self, key)
   1983     def _getitem_array(self, key):
   1984         # also raises Exception if object array with NA values
-> 1985         if com.is_bool_indexer(key):
   1986             # warning here just in case -- previously __setitem__ was
   1987             # reindexing but __getitem__ was not; it seems more reasonable to

~/anaconda3/lib/python3.6/site-packages/pandas/core/common.py in is_bool_indexer(key)
    187             if not lib.is_bool_array(key):
    188                 if isnull(key).any():
--> 189                     raise ValueError('cannot index with vector containing '
    190                                      'NA / NaN values')
    191                 return False

ValueError: cannot index with vector containing NA / NaN values

while the column InvoiceNo doesn't contain any NA values. 而InvoiceNo列不包含任何NA值。

retail_df['InvoiceNo'].isnull().sum()

output: 0 输出:0

So I don't understand why it doesn't work. 所以我不明白为什么它不起作用。

I also tested using: 我还测试了使用:

retail_df['order_canceled'] = retail_df['InvoiceNo'].apply(lambda x:int('C' in x))

and got an error: 并得到一个错误:

TypeError                                 Traceback (most recent call last)
<ipython-input-28-e82a12535b70> in <module>()
----> 1 retail_df['order_canceled'] = retail_df['InvoiceNo'].apply(lambda x:int('C' in x))

~/anaconda3/lib/python3.6/site-packages/pandas/core/series.py in apply(self, func, convert_dtype, args, **kwds)
   2353             else:
   2354                 values = self.asobject
-> 2355                 mapped = lib.map_infer(values, f, convert=convert_dtype)
   2356 
   2357         if len(mapped) and isinstance(mapped[0], Series):

pandas/_libs/src/inference.pyx in pandas._libs.lib.map_infer()

<ipython-input-28-e82a12535b70> in <lambda>(x)
----> 1 retail_df['order_canceled'] = retail_df['InvoiceNo'].apply(lambda x:int('C' in x))

TypeError: argument of type 'int' is not iterable

How to do it? 怎么做?

You have both numbers and strings in the InvoiceNo column, so try the following: 您在InvoiceNo列中同时包含数字和字符串,因此请尝试以下操作:

In [22]: retail_df[retail_df['InvoiceNo'].astype(str).str.contains('C')]
Out[22]:
       InvoiceNo StockCode                          Description  Quantity         InvoiceDate  UnitPrice  CustomerID  \
141      C536379         D                             Discount        -1 2010-12-01 09:41:00      27.50     14527.0
154      C536383    35004C      SET OF 3 COLOURED  FLYING DUCKS        -1 2010-12-01 09:49:00       4.65     15311.0
235      C536391     22556       PLASTERS IN TIN CIRCUS PARADE        -12 2010-12-01 10:24:00       1.65     17548.0
236      C536391     21984     PACK OF 12 PINK PAISLEY TISSUES        -24 2010-12-01 10:24:00       0.29     17548.0
237      C536391     21983     PACK OF 12 BLUE PAISLEY TISSUES        -24 2010-12-01 10:24:00       0.29     17548.0
238      C536391     21980    PACK OF 12 RED RETROSPOT TISSUES        -24 2010-12-01 10:24:00       0.29     17548.0
239      C536391     21484          CHICK GREY HOT WATER BOTTLE       -12 2010-12-01 10:24:00       3.45     17548.0
240      C536391     22557     PLASTERS IN TIN VINTAGE PAISLEY        -12 2010-12-01 10:24:00       1.65     17548.0
241      C536391     22553               PLASTERS IN TIN SKULLS       -24 2010-12-01 10:24:00       1.65     17548.0
939      C536506     22960             JAM MAKING SET WITH JARS        -6 2010-12-01 12:38:00       4.25     17897.0

...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM