CountVectorizer: AttributeError: 'numpy.ndarray' object 没有属性 'lower'

Question

I have a one-dimensional array with large strings in each of the elements.我有一个一维数组，每个元素中都有大字符串。 I am trying to use a CountVectorizer to convert text data into numerical vectors.我正在尝试使用CountVectorizer将文本数据转换为数值向量。 However, I am getting an error saying:但是，我收到一条错误消息：

AttributeError: 'numpy.ndarray' object has no attribute 'lower'

mealarray contains large strings in each of the elements. mealarray在每个元素中都包含大字符串。 There are 5000 such samples.有 5000 个这样的样本。 I am trying to vectorize this as given below:我正在尝试将其矢量化，如下所示：

vectorizer = CountVectorizer(
    stop_words='english',
    ngram_range=(1, 1),  #ngram_range=(1, 1) is the default
    dtype='double',
)
data = vectorizer.fit_transform(mealarray)

The full stacktrace:完整的堆栈跟踪：

File "/Library/Python/2.7/site-packages/sklearn/feature_extraction/text.py", line 817, in fit_transform
    self.fixed_vocabulary_)
  File "/Library/Python/2.7/site-packages/sklearn/feature_extraction/text.py", line 748, in _count_vocab
    for feature in analyze(doc):
  File "/Library/Python/2.7/site-packages/sklearn/feature_extraction/text.py", line 234, in <lambda>
    tokenize(preprocess(self.decode(doc))), stop_words)
  File "/Library/Python/2.7/site-packages/sklearn/feature_extraction/text.py", line 200, in <lambda>
    return lambda x: strip_accents(x.lower())
AttributeError: 'numpy.ndarray' object has no attribute 'lower'

Answer 1

Check the shape of mealarray .检查mealarray的形状。 If the argument to fit_transform is an array of strings, it must be a one-dimensional array.如果fit_transform的参数是字符串数组，则它必须是一维数组。 (That is, mealarray.shape must be of the form (n,) .) For example, you'll get the "no attribute" error if mealarray has a shape such as (n, 1) . （也就是说， mealarray.shape的形式必须为(n,) 。）例如，如果mealarray具有诸如(n, 1)类的形状，您将收到“无属性”错误。

You could try something like你可以尝试类似的东西

data = vectorizer.fit_transform(mealarray.ravel())

Answer 2

Got the answer to my question.得到了我的问题的答案。 Basically, CountVectorizer is taking lists (with string contents) as an argument rather than array.基本上，CountVectorizer 将列表（带有字符串内容）作为参数而不是数组。 That solved my problem.那解决了我的问题。

Answer 3

I got the same error:我得到了同样的错误：

AttributeError: 'numpy.ndarray' object has no attribute 'lower'

To solve this problem, I did the following:为了解决这个问题，我做了以下事情：

Verify the dimension of the array with: name_of_array1.shape验证数组的维度： name_of_array1.shape
I output is: (n,1) then use flatten() to convert an array of two-dimensional to one-dimensional: flat_array = name_of_array1.flatten()我 output 是： (n,1) 然后使用flatten()将二维数组转换为一维： flat_array = name_of_array1.flatten()
Now, I can use CountVectorizer() because this works with list of one argument as a string.现在，我可以使用CountVectorizer() ，因为这适用于一个参数列表作为字符串。

Answer 4

A better solution is explicit call pandas series and pass it CountVectorizer():更好的解决方案是显式调用 pandas 系列并将其传递给 CountVectorizer()：

>>> tex = df4['Text']
>>> type(tex)
<class 'pandas.core.series.Series'>
X_train_counts = count_vect.fit_transform(tex)

Next one won't work, cause its a frame and NOT series下一个不起作用，因为它是一个框架而不是系列

>>> tex2 = (df4.ix[0:,[11]])
>>> type(tex2)
<class 'pandas.core.frame.DataFrame'>

CountVectorizer: AttributeError: 'numpy.ndarray' object 没有属性 'lower'

问题描述

4 个解决方案

解决方案1
18 2014-10-14 18:09:18

解决方案2
8 2014-10-14 18:57:25

解决方案3
2 2021-10-04 11:58:19

解决方案4
1 2018-07-18 16:40:49

CountVectorizer: AttributeError: 'numpy.ndarray' object 没有属性 'lower'

问题描述

4 个解决方案

解决方案1 18 2014-10-14 18:09:18

解决方案2 8 2014-10-14 18:57:25

解决方案3 2 2021-10-04 11:58:19

解决方案4 1 2018-07-18 16:40:49

解决方案1
18 2014-10-14 18:09:18

解决方案2
8 2014-10-14 18:57:25

解决方案3
2 2021-10-04 11:58:19

解决方案4
1 2018-07-18 16:40:49