[英]Filtering a dataframe string column - argument of type 'int' is not iterable / cannot index with vector containing NA / NaN values
I'm working with the dataset online retail . 我正在使用数据集在线零售 。
There is a column called InvoiceNo which represents the invoice code. 有一个名为InvoiceNo的列,代表发票代码。 If this code starts with letter 'c', it indicates a cancellation.
如果此代码以字母“ c”开头,则表示已取消。
I want to groupby InvoiceNo for the instances where InvoiceNo contains 'C' . 我想针对InvoiceNo包含'C'的实例对InvoiceNo进行分组 。
import pandas as pd
import numpy as np
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/00352/Online%20Retail.xlsx'
retail_df = pd.read_excel(url)
temp_df = retail_df[retail_df['InvoiceNo'].str.contains('c')]
I got an error: 我收到一个错误:
ValueError Traceback (most recent call last)
<ipython-input-29-e1f6cb12695b> in <module>()
----> 1 temp_df = retail_df[retail_df['InvoiceNo'].str.contains('c')]
~/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py in __getitem__(self, key)
1956 if isinstance(key, (Series, np.ndarray, Index, list)):
1957 # either boolean or fancy integer index
-> 1958 return self._getitem_array(key)
1959 elif isinstance(key, DataFrame):
1960 return self._getitem_frame(key)
~/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py in _getitem_array(self, key)
1983 def _getitem_array(self, key):
1984 # also raises Exception if object array with NA values
-> 1985 if com.is_bool_indexer(key):
1986 # warning here just in case -- previously __setitem__ was
1987 # reindexing but __getitem__ was not; it seems more reasonable to
~/anaconda3/lib/python3.6/site-packages/pandas/core/common.py in is_bool_indexer(key)
187 if not lib.is_bool_array(key):
188 if isnull(key).any():
--> 189 raise ValueError('cannot index with vector containing '
190 'NA / NaN values')
191 return False
ValueError: cannot index with vector containing NA / NaN values
while the column InvoiceNo doesn't contain any NA values. 而InvoiceNo列不包含任何NA值。
retail_df['InvoiceNo'].isnull().sum()
output: 0 输出:0
So I don't understand why it doesn't work. 所以我不明白为什么它不起作用。
I also tested using: 我还测试了使用:
retail_df['order_canceled'] = retail_df['InvoiceNo'].apply(lambda x:int('C' in x))
and got an error: 并得到一个错误:
TypeError Traceback (most recent call last)
<ipython-input-28-e82a12535b70> in <module>()
----> 1 retail_df['order_canceled'] = retail_df['InvoiceNo'].apply(lambda x:int('C' in x))
~/anaconda3/lib/python3.6/site-packages/pandas/core/series.py in apply(self, func, convert_dtype, args, **kwds)
2353 else:
2354 values = self.asobject
-> 2355 mapped = lib.map_infer(values, f, convert=convert_dtype)
2356
2357 if len(mapped) and isinstance(mapped[0], Series):
pandas/_libs/src/inference.pyx in pandas._libs.lib.map_infer()
<ipython-input-28-e82a12535b70> in <lambda>(x)
----> 1 retail_df['order_canceled'] = retail_df['InvoiceNo'].apply(lambda x:int('C' in x))
TypeError: argument of type 'int' is not iterable
How to do it? 怎么做?
You have both numbers and strings in the InvoiceNo
column, so try the following: 您在
InvoiceNo
列中同时包含数字和字符串,因此请尝试以下操作:
In [22]: retail_df[retail_df['InvoiceNo'].astype(str).str.contains('C')]
Out[22]:
InvoiceNo StockCode Description Quantity InvoiceDate UnitPrice CustomerID \
141 C536379 D Discount -1 2010-12-01 09:41:00 27.50 14527.0
154 C536383 35004C SET OF 3 COLOURED FLYING DUCKS -1 2010-12-01 09:49:00 4.65 15311.0
235 C536391 22556 PLASTERS IN TIN CIRCUS PARADE -12 2010-12-01 10:24:00 1.65 17548.0
236 C536391 21984 PACK OF 12 PINK PAISLEY TISSUES -24 2010-12-01 10:24:00 0.29 17548.0
237 C536391 21983 PACK OF 12 BLUE PAISLEY TISSUES -24 2010-12-01 10:24:00 0.29 17548.0
238 C536391 21980 PACK OF 12 RED RETROSPOT TISSUES -24 2010-12-01 10:24:00 0.29 17548.0
239 C536391 21484 CHICK GREY HOT WATER BOTTLE -12 2010-12-01 10:24:00 3.45 17548.0
240 C536391 22557 PLASTERS IN TIN VINTAGE PAISLEY -12 2010-12-01 10:24:00 1.65 17548.0
241 C536391 22553 PLASTERS IN TIN SKULLS -24 2010-12-01 10:24:00 1.65 17548.0
939 C536506 22960 JAM MAKING SET WITH JARS -6 2010-12-01 12:38:00 4.25 17897.0
...
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.