简体   繁体   English

如何从 nltk 分类器中获得准确率和召回率?

[英]How to get the precision and recall from a nltk classifier?

import nltk
from nltk.corpus import movie_reviews
from nltk.tokenize import word_tokenize

documents = [(list(movie_reviews.words(fileid)), category)
             for category in movie_reviews.categories()
             for fileid in movie_reviews.fileids(category)]


all_words = []

for w in movie_reviews.words():
    all_words.append(w.lower())

all_words = nltk.FreqDist(all_words)

word_features = list(all_words.keys())[:3000]

def find_features(document):
    words = set(document)
    features = {}
    for w in word_features:
        features[w] = (w in words)

    return features

featuresets = [(find_features(rev), category) for (rev, category) in documents]

training_set = featuresets[500:1500]
testing_set = featuresets[:1500]

classifier = nltk.DecisionTreeClassifier.train(training_set)

print "Classifier accuracy percent:",(nltk.classify.accuracy(classifier, testing_set))*100 , "%"

string = raw_input("Enter the string: ")
print (classifier.classify(find_features(word_tokenize(string))))

This code will display the accuracy of the classifier and then get input from user.此代码将显示分类器的准确性,然后从用户那里获取输入。 And it returns the polarity of the string input by the user.它返回用户输入的字符串的极性。

But here's my question: since I can obtain the accuracy by using nltk.accuracy() , is it possible to get its precision and recall as well?但这是我的问题:由于我可以通过使用nltk.accuracy()获得准确度,是否也可以获得其准确度和召回率?

If you're using the nltk package, then it appears you can use the recall and precision functions from nltk.metrics.scores ( see the docs ).如果您使用的是 nltk 包,那么您似乎可以使用nltk.metrics.scoresrecallprecision函数(请参阅文档)。

The functions should be available after invoking调用后函数应该可用

from nltk.metrics.scores import (precision, recall)

Then you need to call them with reference (known labels) and test (the output of your classifier on the test set) sets.然后你需要用reference (已知标签)和test (你的分类器在测试集上的输出)集来调用它们。

Something like the code below should produce these sets as refsets and testsets像下面的代码应该将这些集合生成为refsetstestsets

refsets = collections.defaultdict(set)
testsets = collections.defaultdict(set)

for i, (feats, label) in enumerate(testing_set):
    refsets[label].add(i)
    observed = classifier.classify(feats)
    testsets[observed].add(i)

Then, you can see the precision and recall for positive predictions with something like然后,您可以看到正面预测的准确率和召回率,例如

print( 'Precision:', nltk.metrics.precision(refsets['pos'], testsets['pos']) )
print( 'Recall:', nltk.metrics.recall(refsets['pos'], testsets['pos']) )
# `'pos'` is for the "positive" (as opposed to "negative") label 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM