[英]Getting a 'pos' test on a negative review
好的,所以我訓練了一個NaiveBayes電影評論分類器...但是,當我對它進行負面評論(來自我復制並粘貼到txt文件的網站)時,我得到了“ pos” ...我在做錯什么嗎? 這是下面的代碼:
import nltk, random
from nltk.corpus import movie_reviews
documents = [(list(movie_reviews.words(fileid)), category)
for category in movie_reviews.categories()
for fileid in movie_reviews.fileids(category)]
random.shuffle(documents)
all_words = nltk.FreqDist(w.lower() for w in movie_reviews.words())
word_features = list(all_words)[:2000]
def document_features(document):
document_words = set(document)
features = {}
for word in word_features:
features['contains({})'.format(word)] = (word in document_words)
return features
featuresets = [(document_features(d), c) for (d,c) in documents]
train_set, test_set = featuresets[100:], featuresets[:100]
classifier = nltk.NaiveBayesClassifier.train(train_set)
print(nltk.classify.accuracy(classifier, test_set))
classifier.show_most_informative_features(5)
>>>0.67
>>>Most Informative Features
contains(thematic) = True pos : neg = 8.9 : 1.0
contains(annual) = True pos : neg = 8.9 : 1.0
contains(miscast) = True neg : pos = 8.7 : 1.0
contains(supports) = True pos : neg = 6.9 : 1.0
contains(unbearable) = True neg : pos = 6.7 : 1.0
f = open('negative_review.txt','rU')
fraw = f.read()
review_tokens =nltk.word_tokenize(fraw)
docfts = document_features(review_tokens)
classifier.classify(docfts)
>>> 'pos'
更新多次重新運行該程序后,現在可以將我的負面評論准確地歸為負面...有人可以幫助我理解為什么嗎? 還是這簡單的法術?
分類器並非100%准確。 更好的測試是查看分類器在處理多個電影評論時的行為。 我發現分類器的准確性為67%,這意味着1/3條評論將被錯誤分類。 您可以嘗試使用其他分類器或其他功能(嘗試使用n-gram和word2vec)來改進模型。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.