[英]CountVectorizer: AttributeError: 'numpy.ndarray' object has no attribute 'lower'
I have a one-dimensional array with large strings in each of the elements.我有一个一维数组,每个元素中都有大字符串。 I am trying to use a
CountVectorizer
to convert text data into numerical vectors.我正在尝试使用
CountVectorizer
将文本数据转换为数值向量。 However, I am getting an error saying:但是,我收到一条错误消息:
AttributeError: 'numpy.ndarray' object has no attribute 'lower'
mealarray
contains large strings in each of the elements. mealarray
在每个元素中都包含大字符串。 There are 5000 such samples.有 5000 个这样的样本。 I am trying to vectorize this as given below:
我正在尝试将其矢量化,如下所示:
vectorizer = CountVectorizer(
stop_words='english',
ngram_range=(1, 1), #ngram_range=(1, 1) is the default
dtype='double',
)
data = vectorizer.fit_transform(mealarray)
The full stacktrace:完整的堆栈跟踪:
File "/Library/Python/2.7/site-packages/sklearn/feature_extraction/text.py", line 817, in fit_transform
self.fixed_vocabulary_)
File "/Library/Python/2.7/site-packages/sklearn/feature_extraction/text.py", line 748, in _count_vocab
for feature in analyze(doc):
File "/Library/Python/2.7/site-packages/sklearn/feature_extraction/text.py", line 234, in <lambda>
tokenize(preprocess(self.decode(doc))), stop_words)
File "/Library/Python/2.7/site-packages/sklearn/feature_extraction/text.py", line 200, in <lambda>
return lambda x: strip_accents(x.lower())
AttributeError: 'numpy.ndarray' object has no attribute 'lower'
Check the shape of mealarray
.检查
mealarray
的形状。 If the argument to fit_transform
is an array of strings, it must be a one-dimensional array.如果
fit_transform
的参数是字符串数组,则它必须是一维数组。 (That is, mealarray.shape
must be of the form (n,)
.) For example, you'll get the "no attribute" error if mealarray
has a shape such as (n, 1)
. (也就是说,
mealarray.shape
的形式必须为(n,)
。)例如,如果mealarray
具有诸如(n, 1)
类的形状,您将收到“无属性”错误。
You could try something like你可以尝试类似的东西
data = vectorizer.fit_transform(mealarray.ravel())
Got the answer to my question.得到了我的问题的答案。 Basically, CountVectorizer is taking lists (with string contents) as an argument rather than array.
基本上,CountVectorizer 将列表(带有字符串内容)作为参数而不是数组。 That solved my problem.
那解决了我的问题。
I got the same error:我得到了同样的错误:
AttributeError: 'numpy.ndarray' object has no attribute 'lower'
To solve this problem, I did the following:为了解决这个问题,我做了以下事情:
name_of_array1.shape
name_of_array1.shape
flatten()
to convert an array of two-dimensional to one-dimensional: flat_array = name_of_array1.flatten()
flatten()
将二维数组转换为一维: flat_array = name_of_array1.flatten()
CountVectorizer()
because this works with list of one argument as a string.CountVectorizer()
,因为这适用于一个参数列表作为字符串。A better solution is explicit call pandas series and pass it CountVectorizer():更好的解决方案是显式调用 pandas 系列并将其传递给 CountVectorizer():
>>> tex = df4['Text']
>>> type(tex)
<class 'pandas.core.series.Series'>
X_train_counts = count_vect.fit_transform(tex)
Next one won't work, cause its a frame and NOT series下一个不起作用,因为它是一个框架而不是系列
>>> tex2 = (df4.ix[0:,[11]])
>>> type(tex2)
<class 'pandas.core.frame.DataFrame'>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.