[英]How to calculate recall, precision and f-measure?
I'm working on a sentiment analysis project and I'm beginner in Python. 我正在进行情绪分析项目,并且是Python的初学者。 I need to calculate recall, precision and f-measure but I don't know the syntax for my data sets, which look like this:
我需要计算召回率,精度和f量度,但我不知道数据集的语法,如下所示:
#The train data format ,contains text's words with their weights and the text's class label
train_set = [
({'adam': 0.05,'is': 0.0, 'a': 0.0, 'good': 0.02, 'man': 0.0}, 1),
({'eve': 0.0, 'is': 0.0, 'a': 0.0,'good': 0.02,'woman': 0.0}, 1),
({'adam': 0.05, 'is': 0.0, 'evil': 0.0}, 0)]
#0 or 1 for class label
#Test data the same as train data
This is my current code 这是我当前的代码
from nltk.classify import apply_features
def naivebyse(finaltfidfVector):
train_set = []
j = 0
for vector in finaltfidfVector:
if j < 2100: #take 70% of data for train
train_set.append(vector)
j += 1
else:
break
test_set = []
j = 0
for vector in finaltfidfVector:
if j < 3000 and j >= 2100: # 30% for test
test_set.append(vector)
if j>= 3000:
break
j += 1
classifier = nltk.NaiveBayesClassifier.train(train_set)
print("Accuracy of sarcasm classifier : ",
(nltk.classify.accuracy(classifier, test_set)*100))
refsets = collections.defaultdict(set)
testsets = collections.defaultdict(set)
for i, (feats, label) in enumerate(test_set):
refsets[label].add(i)
observed = classifier.classify(feats)
testsets[observed].add(i)
print("Precision percentage : " , nltk.metrics.precision(refsets['1'],
testsets['1'])*100)
print("Recall Percentage : ", nltk.metrics.recall(refsets['1'],
testsets['1'])*100)
Exception 例外
Exception in Tkinter callback
unable to realloc 20234 bytes
Can anyone provide some hints on how to carry out the task? 谁能提供一些有关如何执行任务的提示?
You could use the scikit-learn library to do so eg with 您可以使用scikit-learn库这样做,例如
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score, classification_report, confusion_matrix
f1 = f1_score(y_test, y_pred)
prec = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred) `
Not sure if that applies to your dataset, but it is a best practice to perform cross validation as well. 不知道这是否适用于您的数据集,但是最好还是执行交叉验证 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.