简体   繁体   English

CountVectorizer: AttributeError: 'numpy.ndarray' object 没有属性 'lower'

[英]CountVectorizer: AttributeError: 'numpy.ndarray' object has no attribute 'lower'

I have a one-dimensional array with large strings in each of the elements.我有一个一维数组,每个元素中都有大字符串。 I am trying to use a CountVectorizer to convert text data into numerical vectors.我正在尝试使用CountVectorizer将文本数据转换为数值向量。 However, I am getting an error saying:但是,我收到一条错误消息:

AttributeError: 'numpy.ndarray' object has no attribute 'lower'

mealarray contains large strings in each of the elements. mealarray在每个元素中都包含大字符串。 There are 5000 such samples.有 5000 个这样的样本。 I am trying to vectorize this as given below:我正在尝试将其矢量化,如下所示:

vectorizer = CountVectorizer(
    stop_words='english',
    ngram_range=(1, 1),  #ngram_range=(1, 1) is the default
    dtype='double',
)
data = vectorizer.fit_transform(mealarray)

The full stacktrace:完整的堆栈跟踪:

File "/Library/Python/2.7/site-packages/sklearn/feature_extraction/text.py", line 817, in fit_transform
    self.fixed_vocabulary_)
  File "/Library/Python/2.7/site-packages/sklearn/feature_extraction/text.py", line 748, in _count_vocab
    for feature in analyze(doc):
  File "/Library/Python/2.7/site-packages/sklearn/feature_extraction/text.py", line 234, in <lambda>
    tokenize(preprocess(self.decode(doc))), stop_words)
  File "/Library/Python/2.7/site-packages/sklearn/feature_extraction/text.py", line 200, in <lambda>
    return lambda x: strip_accents(x.lower())
AttributeError: 'numpy.ndarray' object has no attribute 'lower'

Check the shape of mealarray .检查mealarray的形状。 If the argument to fit_transform is an array of strings, it must be a one-dimensional array.如果fit_transform的参数是字符串数组,则它必须是一维数组。 (That is, mealarray.shape must be of the form (n,) .) For example, you'll get the "no attribute" error if mealarray has a shape such as (n, 1) . (也就是说, mealarray.shape的形式必须为(n,) 。)例如,如果mealarray具有诸如(n, 1)类的形状,您将收到“无属性”错误。

You could try something like你可以尝试类似的东西

data = vectorizer.fit_transform(mealarray.ravel())

Got the answer to my question.得到了我的问题的答案。 Basically, CountVectorizer is taking lists (with string contents) as an argument rather than array.基本上,CountVectorizer 将列表(带有字符串内容)作为参数而不是数组。 That solved my problem.那解决了我的问题。

I got the same error:我得到了同样的错误:

AttributeError: 'numpy.ndarray' object has no attribute 'lower'

To solve this problem, I did the following:为了解决这个问题,我做了以下事情:

  1. Verify the dimension of the array with: name_of_array1.shape验证数组的维度: name_of_array1.shape
  2. I output is: (n,1) then use flatten() to convert an array of two-dimensional to one-dimensional: flat_array = name_of_array1.flatten()我 output 是: (n,1) 然后使用flatten()将二维数组转换为一维: flat_array = name_of_array1.flatten()
  3. Now, I can use CountVectorizer() because this works with list of one argument as a string.现在,我可以使用CountVectorizer() ,因为这适用于一个参数列表作为字符串。

A better solution is explicit call pandas series and pass it CountVectorizer():更好的解决方案是显式调用 pandas 系列并将其传递给 CountVectorizer():

>>> tex = df4['Text']
>>> type(tex)
<class 'pandas.core.series.Series'>
X_train_counts = count_vect.fit_transform(tex)

Next one won't work, cause its a frame and NOT series下一个不起作用,因为它是一个框架而不是系列

>>> tex2 = (df4.ix[0:,[11]])
>>> type(tex2)
<class 'pandas.core.frame.DataFrame'>

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 sklearn 中的 CountVectorizer 抛出“AttributeError: 'numpy.ndarray' object has no attribute 'lower'” - CountVectorizer in sklearn throws “AttributeError: 'numpy.ndarray' object has no attribute 'lower'” 如何解决“AttributeError:&#39;numpy.ndarray&#39;对象没有属性&#39;lower&#39;”? - How solved "AttributeError: 'numpy.ndarray' object has no attribute 'lower'"? AttributeError: 'numpy.ndarray' object 没有属性 'lower' - AttributeError: 'numpy.ndarray' object has no attribute 'lower' 'numpy.ndarray' 对象没有属性 'lower' - 'numpy.ndarray' object has no attribute 'lower' AttributeError:“ numpy.ndarray”对象没有属性“ A” - AttributeError: 'numpy.ndarray' object has no attribute 'A' AttributeError: &#39;numpy.ndarray&#39; 对象没有属性 &#39;lower&#39; 拟合逻辑模型数据 - AttributeError: 'numpy.ndarray' object has no attribute 'lower' fitting logistic model data 在 word tokenizer 中出现错误“AttributeError: &#39;numpy.ndarray&#39; object has no attribute &#39;lower&#39;” - Getting error "AttributeError: 'numpy.ndarray' object has no attribute 'lower' " in word tokenizer AttributeError:“ numpy.ndarray”对象在tockenizer_left.texts_to_sequences(x_left)中没有属性“ lower” - AttributeError: 'numpy.ndarray' object has no attribute 'lower' in tockenizer_left.texts_to_sequences(x_left) 我如何通过消除错误来训练管道中的GaussianNB [AttributeError:&#39;numpy.ndarray&#39;对象没有属性&#39;lower&#39;] - How can i train GaussianNB in pipeline by removing error[AttributeError: 'numpy.ndarray' object has no attribute 'lower'] Numpy和Matplotlib-AttributeError:“ numpy.ndarray”对象没有属性“ replace” - Numpy and Matplotlib - AttributeError: 'numpy.ndarray' object has no attribute 'replace'
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM