[英]Decision Tree nltk
I am trying different learning methods (Decision Tree, NaiveBayes, MaxEnt) to compare their relative performance to get to know the best method among them. 我正在尝试不同的学习方法(决策树,NaiveBayes,MaxEnt)以比较它们的相对性能,以了解其中的最佳方法。 How to implement the Decision Tree and get its accuracy?
如何实现决策树并获得其准确性?
import string
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import confusion_matrix
import nltk, nltk.classify.util, nltk.metrics
from nltk.classify import MaxentClassifier
from nltk.collocations import BigramCollocationFinder
from nltk.metrics import BigramAssocMeasures
from nltk.probability import FreqDist, ConditionalFreqDist
from sklearn import cross_validation
import nltk.classify.util
from nltk.classify import NaiveBayesClassifier
from nltk.corpus import movie_reviews
from nltk.classify import MaxentClassifier
from nltk.corpus import movie_reviews
from nltk.corpus import movie_reviews as mr
stop = stopwords.words('english')
words = [([w for w in mr.words(i) if w.lower() not in stop and w.lower() not in string.punctuation], i.split('/')[0]) for i in mr.fileids()]
def word_feats(words):
return dict([(word, True) for word in words])
negids = movie_reviews.fileids('neg')
posids = movie_reviews.fileids('pos')
negfeats = [(word_feats(movie_reviews.words(fileids=[f])), 'neg') for f in negids]
posfeats = [(word_feats(movie_reviews.words(fileids=[f])), 'pos') for f in posids]
negcutoff = len(negfeats)*3/4
poscutoff = len(posfeats)*3/4
trainfeats = negfeats[:negcutoff] + posfeats[:poscutoff]
DecisionTree_classifier = DecisionTreeClassifier.train(trainfeats, binary=True, depth_cutoff=20, support_cutoff=20, entropy_cutoff=0.01)
print(accuracy(DecisionTree_classifier, testfeats))
You will have to look at the code (or documentation strings) of nltk3. 您将必须查看nltk3的代码(或文档字符串)。 There is also a chance the examples given in nltk book will work without any changes.
nltk书中给出的示例也有可能无需任何更改即可运行。 See http://www.nltk.org/book/ch06.html#DecisionTrees
参见http://www.nltk.org/book/ch06.html#DecisionTrees
Or you could just run a test sample and count the false positive and false negative rates yourself 或者您可以运行一个测试样本并自己计算假阳性和假阴性率
That is your accuracy. 那是你的准确性。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.