獲得否定評論的“ pos”測試

Question

好的，所以我訓練了一個NaiveBayes電影評論分類器...但是，當我對它進行負面評論（來自我復制並粘貼到txt文件的網站）時，我得到了“ pos” ...我在做錯什么嗎？ 這是下面的代碼：

import nltk, random
from nltk.corpus import movie_reviews
documents = [(list(movie_reviews.words(fileid)), category)
for category in movie_reviews.categories()
for fileid in movie_reviews.fileids(category)]
random.shuffle(documents)
all_words = nltk.FreqDist(w.lower() for w in movie_reviews.words())
word_features = list(all_words)[:2000]

def document_features(document): 
    document_words = set(document) 
    features = {}
    for word in word_features:
        features['contains({})'.format(word)] = (word in document_words)
    return features

featuresets = [(document_features(d), c) for (d,c) in documents]
train_set, test_set = featuresets[100:], featuresets[:100]
classifier = nltk.NaiveBayesClassifier.train(train_set)

print(nltk.classify.accuracy(classifier, test_set)) 
classifier.show_most_informative_features(5)
>>>0.67
>>>Most Informative Features
      contains(thematic) = True              pos : neg    =      8.9 : 1.0
        contains(annual) = True              pos : neg    =      8.9 : 1.0
       contains(miscast) = True              neg : pos    =      8.7 : 1.0
      contains(supports) = True              pos : neg    =      6.9 : 1.0
    contains(unbearable) = True              neg : pos    =      6.7 : 1.0

f = open('negative_review.txt','rU')
fraw = f.read()
review_tokens =nltk.word_tokenize(fraw)
docfts = document_features(review_tokens)

classifier.classify(docfts)
>>>    'pos'

更新多次重新運行該程序后，現在可以將我的負面評論准確地歸為負面...有人可以幫助我理解為什么嗎？ 還是這簡單的法術？

Answer 1

分類器並非100％准確。 更好的測試是查看分類器在處理多個電影評論時的行為。 我發現分類器的准確性為67％，這意味着1/3條評論將被錯誤分類。 您可以嘗試使用其他分類器或其他功能（嘗試使用n-gram和word2vec）來改進模型。

獲得否定評論的“ pos”測試

問題描述

1 個解決方案

解決方案1
1 2017-03-01 06:14:53

獲得否定評論的“ pos”測試

問題描述

1 個解決方案

解決方案1 1 2017-03-01 06:14:53

解決方案1
1 2017-03-01 06:14:53