简体   繁体   English

我如何通过消除错误来训练管道中的GaussianNB [AttributeError:'numpy.ndarray'对象没有属性'lower']

[英]How can i train GaussianNB in pipeline by removing error[AttributeError: 'numpy.ndarray' object has no attribute 'lower']

This is the data i use count vectorizer and tfidftransformer and also use GaussianNB but i get error in this code. 这是我使用计数矢量化器和tfidftransformer以及也使用GaussianNB的数据,但是我在此代码中遇到错误。 Please let me know the correct syntax. 请让我知道正确的语法。

train = [('I love this sandwich.','pos'),
     ('This is an amazing place!', 'pos'),
     ('I feel very good about these beers.', 'pos'),
     ('This is my best work.', 'pos'),
     ('What an awesome view', 'pos'),
     ('I do not like this restaurant', 'neg'),
     ('I am tired of this stuff.', 'neg'),
     ("I can't deal with this.", 'neg'),
     ('He is my sworn enemy!.', 'neg'),
     ('My boss is horrible.', 'neg')
    ]
from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer()

text_train_cv = cv.fit_transform(list(zip(*train))[0])
print(text_train_cv.toarray())

from sklearn.feature_extraction.text import TfidfTransformer
tfidf_trans = TfidfTransformer()

text_train_tfidf = tfidf_trans.fit_transform(text_train_cv)
print(text_train_tfidf.toarray())

from sklearn.naive_bayes import GaussianNB
clf = GaussianNB().fit(text_train_tfidf.toarray(), list(zip(*train))[1])

text_clf = Pipeline([('vect',CountVectorizer(stop_words='english')), 
('tfidf',TfidfTransformer()),('clf',GaussianNB(priors=None))])
text_clf = text_clf.fit(text_train_tfidf.toarray() , list(zip(*train))[1])
print(text_clf)

It give me error: AttributeError: 'numpy.ndarray' object has no attribute 'lower' 它给我错误:AttributeError:'numpy.ndarray'对象没有属性'lower'

Do

clf = GaussianNB().fit(text_train_tfidf.toarray() , list(zip(*train))[1])

The GaussianNB doesnt support sparse matrices as input for X, but the TfidfTransformer will by default return a sparse matrix. GaussianNB不支持将稀疏矩阵作为X的输入,但默认情况下,TfidfTransformer将返回稀疏矩阵。 Hence the error. 因此,错误。

toarray() will convert that to dense. toarray()会将其转换为密集型。 But note that it will lead to a high increase in memory usage. 但是请注意,这将导致内存使用量的大幅增加。

Update: 更新:

When using a pipeline, you need to supply the data which you passed to transformer in the pipeline. 使用管道时,需要提供传递给管道中的转换器的数据。 In this case that is list(zip(*train))[0] . 在这种情况下,它就是list(zip(*train))[0]

text_clf = text_clf.fit(list(zip(*train))[0] , list(zip(*train))[1])

That will solve your first error. 那将解决您的第一个错误。 But you will still get an error because of sparse matrix. 但是由于稀疏矩阵,您仍然会收到错误消息。 See this answer for solving that :- https://stackoverflow.com/a/28384887/3374996 请参阅此答案以解决以下问题: -https : //stackoverflow.com/a/28384887/3374996

MultinomialNB is used very often for text classification tasks and it does support sparse matrices as an input data set. MultinomialNB通常用于文本分类任务,它确实支持稀疏矩阵作为输入数据集。

PS using dense matrices for bigger corpuses you might end up with the MemoryError PS将密集矩阵用于更大的语料库,您可能最终会遇到MemoryError

So try this: 所以试试这个:

from sklearn.naive_bayes import MultinomialNB

clf = MultinomialNB().fit(text_train_tfidf , list(zip(*train))[1])

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何解决“AttributeError:'numpy.ndarray'对象没有属性'lower'”? - How solved "AttributeError: 'numpy.ndarray' object has no attribute 'lower'"? AttributeError: 'numpy.ndarray' object 没有属性 'lower' - AttributeError: 'numpy.ndarray' object has no attribute 'lower' CountVectorizer: AttributeError: 'numpy.ndarray' object 没有属性 'lower' - CountVectorizer: AttributeError: 'numpy.ndarray' object has no attribute 'lower' 在 word tokenizer 中出现错误“AttributeError: 'numpy.ndarray' object has no attribute 'lower'” - Getting error "AttributeError: 'numpy.ndarray' object has no attribute 'lower' " in word tokenizer 'numpy.ndarray' 对象没有属性 'lower' - 'numpy.ndarray' object has no attribute 'lower' AttributeError:“ numpy.ndarray”对象没有属性“ A” - AttributeError: 'numpy.ndarray' object has no attribute 'A' AttributeError: 'numpy.ndarray' object has no attribute 'score' 错误 - AttributeError: 'numpy.ndarray' object has no attribute 'score' error AttributeError: 'numpy.ndarray' object 没有属性 'append' 错误 - AttributeError: 'numpy.ndarray' object has no attribute 'append' error 你能帮我解决这个错误吗? AttributeError: 'numpy.ndarray' object 没有属性 'keys' - Can you help me in this error? AttributeError: 'numpy.ndarray' object has no attribute 'keys' 如何修复此错误:AttributeError: 'numpy.ndarray' object has no attribute 'apply' - How to fix this error :AttributeError: 'numpy.ndarray' object has no attribute 'apply'
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM