Python NLTK Classifier.train（trainfeats）…ValueError：需要多个值才能解压

Question

def word_feats(words):
     return dict([(word, True) for word in words])

for tweet in negTweets:
     words = re.findall(r"[\w']+|[.,!?;]", tweet) #splits the tweet into words
     negwords = [(word_feats(words), 'neg')] #tag the words with feature
     negfeats.append(negwords) #add the words to the feature list
for tweet in posTweets:
     words = re.findall(r"[\w']+|[.,!?;]", tweet)
     poswords = [(word_feats(words), 'pos')]
     posfeats.append(poswords)

negcutoff = len(negfeats)*3/4 #take 3/4ths of the words
poscutoff = len(posfeats)*3/4

trainfeats = negfeats[:negcutoff] + posfeats[:poscutoff] #assemble the train set
testfeats = negfeats[negcutoff:] + posfeats[poscutoff:]

classifier = NaiveBayesClassifier.train(trainfeats)
print 'accuracy:', nltk.classify.util.accuracy(classifier, testfeats)
classifier.show_most_informative_features()

I am getting the following error when running this code... 运行此代码时出现以下错误...

File "C:\Python27\lib\nltk\classify\naivebayes.py", line 191, in train

for featureset, label in labeled_featuresets:

ValueError: need more than 1 value to unpack

The error is coming from the classifier = NaiveBayesClassifier.train(trainfeats) line and I'm not sure why. 错误来自分类= NaiveBayesClassifier.train（trainfeats）行，我不确定为什么。 I have done something like this before, and my trainfeats seams to be in the same format as then... a sample from the format is listed below... 我之前已经做过类似的事情，并且我的trainfeats接缝的格式与那时相同...下面列出了该格式的示例...

[[({'me': True, 'af': True, 'this': True, 'joy': True, 'high': True, 'hookah': True, 'got': True}, 'pos')]] [[[{{'me'：True，'af'：True，'this'：True，'joy'：True，'high'：True，'hookah'：True，'got'：True}，'pos' ）]]

what other value does my trainfeats need to create the classifier? 我的trainfeats创建分类器还需要其他什么价值？ emphasized text 强调文字

Answer 1

The comment by @Prune is right: Your labeled_featuresets should be a sequence of pairs (two-element lists or tuples): A feature dict and a category for each data point. @Prune的注释是正确的：您的labeled_featuresets应该是一对对的序列（两个元素的列表或元组）：每个数据点的特征字典和类别。 Instead, each element in your trainfeats is a list containing one element: A tuple of those two things. 相反， trainfeats中的每个元素都是一个包含一个元素的列表：这两个东西的元组。 Lose the square brackets in both feature-building loops and this part should work correctly. 在两个功能构建循环中都丢失了方括号，该部分应正常工作。 Eg, 例如，

negwords = (word_feats(words), 'neg')
negfeats.append(negwords)

Two more things: Consider using nltk.word_tokenize() instead of doing your own tokenization. 还有两件事：考虑使用nltk.word_tokenize()而不是自己进行标记化。 And you should randomize the order of your training data, eg with random.scramble(trainfeats) . 并且您应该将训练数据的顺序随机化，例如使用random.scramble(trainfeats) 。

Python NLTK Classifier.train（trainfeats）…ValueError：需要多个值才能解压

问题描述

1 个解决方案

解决方案1
1 2016-11-10 20:24:40

Python NLTK Classifier.train（trainfeats）…ValueError：需要多个值才能解压

问题描述

1 个解决方案

解决方案1 1 2016-11-10 20:24:40

解决方案1
1 2016-11-10 20:24:40