简体   繁体   中英

TypeError: A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array. with NaiveBayes Classifier

The Multinomial Naive Bayes Classifier is giving the correct result but the other two- The Gaussian NB and the Binomial NB are not. The error it gives is this:

TypeError: A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array.

But even on adding that function ( train_set.toarray() ) the error is

AttributeError: 'list' object has no attribute 'toarray'

The code is

import pickle
from nltk.corpus import names
import random
import nltk
from sklearn.naive_bayes import MultinomialNB, GaussianNB, BernoulliNB
from sklearn.linear_model import SGDClassifier, LogisticRegression
from sklearn.svm import SVC, LinearSVC, NuSVC
from nltk.classify.scikitlearn import SklearnClassifier
import numpy as np
import scipy as sc

def gender_features(word):
    return {'last_letter': word[-1]}

labeled_names = ([(name, 'male') for name in names.words('male.txt')] + [(name, 'female') for name in names.words('female.txt')])
random.shuffle(labeled_names)

featuresets = [(gender_features(n), gender) for (n, gender) in labeled_names]
train_set, test_set = featuresets[500:], featuresets[:500]
classifier = nltk.NaiveBayesClassifier.train(train_set)

print(nltk.classify.accuracy(classifier, test_set)*100)
classifier.show_most_informative_features(5)

MNB_classifier = SklearnClassifier(MultinomialNB())
MNB_classifier.train(train_set)
print ("MNB classifier accuracy: ", (nltk.classify.accuracy(MNB_classifier, test_set))*100)


G_classifier = SklearnClassifier(GaussianNB())
G_classifier.train(train_set)
print ("Gaussian classifier accuracy: ", (nltk.classify.accuracy(G_classifier, test_set))*100)

B_classifier = SklearnClassifier(BernoulliNB())
B_classifier.train(train_set)
print ("Bernoulli classifier accuracy: ", (nltk.classify.accuracy(B_classifier, test_set))*100)

I got the same problem, while training try to use:

train_set.todense()

It worked for me:

也许你可以这样做: numpy.array(train_set) ,使列表密集 m

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM