[英]K-Fold Cross Validation for Naive Bayes Classifier
我使用nltk创建了一个分类器,它将评论分为3类pos,neg和neu。
def get_feature(word):
return dict([(word, True)])
def bag_of_words(words):
return dict([(word, True) for word in words])
def create_training_dict(text, sense):
''' returns a dict ready for a classifier's test method '''
tokens = extract_words(text)
return [(bag_of_words(tokens), sense)]
def get_train_set(texts):
train_set = []
for words, sense in texts:
train_set = train_set + [(get_feature(word), sense) for word in words]
return train_set
doc_bow.append((top_tfidf,polarity))
train_set = get_train_set(doc_bow)
classifier = NaiveBayesClassifier.train(train_set)
decision = classifier.classify(tokens)
现在,我想进行10倍交叉验证来测试分类器。 我从sklearn找到了一个例子。
from sklearn import cross_validation
from sklearn.naive_bayes import MultinomialNB
target = np.array( [x[0] for x in train_set] )
train = np.array( [x[1:] for x in train_set] )
cfr = MultinomialNB()
#Simple K-Fold cross validation. 10 folds.
cv = cross_validation.KFold(len(train_set), k=10, indices=False)
results = []
for traincv, testcv in cv:
probas = cfr.fit(train[traincv], target[traincv]).predict_proba(train[testcv])
results.append( myEvaluationFunc(target[testcv], [x[1] for x in probas]) )
print "Results: " + str( np.array(results).mean() )
我收到了这个错误
raise ValueError("Input X must be non-negative.")
ValueError: Input X must be non-negative.
我不确定我传入的参数是否正确。
MultinomialNB旨在与非负特征值一起使用。
你尝试过GaussianNB吗?
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.