我如何通过消除错误来训练管道中的GaussianNB [AttributeError：'numpy.ndarray'对象没有属性'lower']

Question

This is the data i use count vectorizer and tfidftransformer and also use GaussianNB but i get error in this code. 这是我使用计数矢量化器和tfidftransformer以及也使用GaussianNB的数据，但是我在此代码中遇到错误。 Please let me know the correct syntax. 请让我知道正确的语法。

train = [('I love this sandwich.','pos'),
     ('This is an amazing place!', 'pos'),
     ('I feel very good about these beers.', 'pos'),
     ('This is my best work.', 'pos'),
     ('What an awesome view', 'pos'),
     ('I do not like this restaurant', 'neg'),
     ('I am tired of this stuff.', 'neg'),
     ("I can't deal with this.", 'neg'),
     ('He is my sworn enemy!.', 'neg'),
     ('My boss is horrible.', 'neg')
    ]
from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer()

text_train_cv = cv.fit_transform(list(zip(*train))[0])
print(text_train_cv.toarray())

from sklearn.feature_extraction.text import TfidfTransformer
tfidf_trans = TfidfTransformer()

text_train_tfidf = tfidf_trans.fit_transform(text_train_cv)
print(text_train_tfidf.toarray())

from sklearn.naive_bayes import GaussianNB
clf = GaussianNB().fit(text_train_tfidf.toarray(), list(zip(*train))[1])

text_clf = Pipeline([('vect',CountVectorizer(stop_words='english')), 
('tfidf',TfidfTransformer()),('clf',GaussianNB(priors=None))])
text_clf = text_clf.fit(text_train_tfidf.toarray() , list(zip(*train))[1])
print(text_clf)

It give me error: AttributeError: 'numpy.ndarray' object has no attribute 'lower' 它给我错误：AttributeError：'numpy.ndarray'对象没有属性'lower'

Answer 1

Do 做

clf = GaussianNB().fit(text_train_tfidf.toarray() , list(zip(*train))[1])

The GaussianNB doesnt support sparse matrices as input for X, but the TfidfTransformer will by default return a sparse matrix. GaussianNB不支持将稀疏矩阵作为X的输入，但默认情况下，TfidfTransformer将返回稀疏矩阵。 Hence the error. 因此，错误。

toarray() will convert that to dense. toarray()会将其转换为密集型。 But note that it will lead to a high increase in memory usage. 但是请注意，这将导致内存使用量的大幅增加。

Update: 更新：

When using a pipeline, you need to supply the data which you passed to transformer in the pipeline. 使用管道时，需要提供传递给管道中的转换器的数据。 In this case that is list(zip(*train))[0] . 在这种情况下，它就是list(zip(*train))[0] 。

text_clf = text_clf.fit(list(zip(*train))[0] , list(zip(*train))[1])

That will solve your first error. 那将解决您的第一个错误。 But you will still get an error because of sparse matrix. 但是由于稀疏矩阵，您仍然会收到错误消息。 See this answer for solving that :- https://stackoverflow.com/a/28384887/3374996 请参阅此答案以解决以下问题： -https : //stackoverflow.com/a/28384887/3374996

Answer 2

MultinomialNB is used very often for text classification tasks and it does support sparse matrices as an input data set. MultinomialNB通常用于文本分类任务，它确实支持稀疏矩阵作为输入数据集。

PS using dense matrices for bigger corpuses you might end up with the MemoryError PS将密集矩阵用于更大的语料库，您可能最终会遇到MemoryError

So try this: 所以试试这个：

from sklearn.naive_bayes import MultinomialNB

clf = MultinomialNB().fit(text_train_tfidf , list(zip(*train))[1])

我如何通过消除错误来训练管道中的GaussianNB [AttributeError：'numpy.ndarray'对象没有属性'lower']

问题描述

2 个解决方案

解决方案1
1 已采纳 2018-04-16 10:24:30

解决方案2
1 2018-04-16 10:30:49

我如何通过消除错误来训练管道中的GaussianNB [AttributeError：&#39;numpy.ndarray&#39;对象没有属性&#39;lower&#39;]

问题描述

2 个解决方案

解决方案1 1 已采纳 2018-04-16 10:24:30

解决方案2 1 2018-04-16 10:30:49

我如何通过消除错误来训练管道中的GaussianNB [AttributeError：'numpy.ndarray'对象没有属性'lower']

解决方案1
1 已采纳 2018-04-16 10:24:30

解决方案2
1 2018-04-16 10:30:49