简体   繁体   中英

Python for sentiment analysis

I have a sample code as follow which use the training and testing data both from nltk corpus and print out the sentiment of sentences. What I'd like to do is to replace the testing dataset with any text.

from nltk.classify import NaiveBayesClassifier
from nltk.corpus import subjectivity
from nltk.sentiment import SentimentAnalyzer
from nltk.sentiment.util import *

n_instances = 100

# Each document is represented by a tuple (sentence, label).
# The sentence is tokenized, so it is represented by a list of strings:
subj_docs = [(sent, 'subj') for sent in subjectivity.sents(categories='subj')[:n_instances]]
obj_docs = [(sent, 'obj') for sent in subjectivity.sents(categories='obj')[:n_instances]]

# split subjective and objective instances to keep a balanced uniform class distribution
# in both train and test sets
train_subj_docs = subj_docs[:80]
test_subj_docs = subj_docs[80:100]
train_obj_docs = obj_docs[:80]
test_obj_docs = obj_docs[80:100]
training_docs = train_subj_docs+train_obj_docs
testing_docs = test_subj_docs+test_obj_docs


sentim_analyzer = SentimentAnalyzer()
all_words_neg = sentim_analyzer.all_words([mark_negation(doc) for doc in training_docs])

# simple unigram word features, handling negation
unigram_feats = sentim_analyzer.unigram_word_feats(all_words_neg, min_freq=4)
sentim_analyzer.add_feat_extractor(extract_unigram_feats, unigrams=unigram_feats)

# apply features to obtain a feature-value representation of our datasets
training_set = sentim_analyzer.apply_features(training_docs)
test_set = sentim_analyzer.apply_features(testing_docs)

# train the Naive Bayes classifier on the training set
trainer = NaiveBayesClassifier.train
classifier = sentim_analyzer.train(trainer, training_set)

# output evaluation results
for key,value in sorted(sentim_analyzer.evaluate(test_set).items()):
    print('{0}: {1}'.format(key, value))

So when I tried to replace the testing_docs with a variable that store text, Something like paragraph = "Hello World, this is a test dataset" . I'm getting this error message ValueError: too many values to unpack (expected 2) .

Anyone know how can I fix this error? Thank you.

This is because testing_docs is not a string but rather a list of tuples. Print out the value of testing_docs from the example and if you want to replace it with your paragraphs make sure it uses the same format.

If you want to understand the error you are getting you should first read about and understand tuple unpacking .

This simple example replicates it:

>>> a = 'abc'
>>> b,c=a
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: too many values to unpack (expected 2)

This is because my string above has three values so to unpack it I would have to assign it to three variables (ie b,c,d=a works).

However testing_docs is actually more similar to

a = [
    ('a','subj'),
    ('b','subj'),
    ('c','obj')
]

(Although I highly doubt the first element of each of those tuples is a single character.)

My guess is that somewhere in the code you find a loop that tries to unpack the values of testing_docs to two variables so something like

for val, category in testing_docs:
    ...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM