I am working on a code for Sentiment Analysis. Now I would like to use a Stemmer in my code snippet, but when I use print function, the results show that the stemming does not work. Do you have any idea what I am doing wrong? Here is my code snippet:
pos_data = []
with open('Positive.txt') as f:
for line in f:
pos_data.append([format_sentence(line), 'pos'])
for line in f:
stemmer.stem(pos_data)
print (pos_data)
You need to both split the file into lines and potentially split the lines into words (that can be tokenized)
>>> import nltk
>>> from nltk import PorterStemmer
>>> test = 'this sentence is just a tester set of words'
>>> test_tokenize = nltk.word_tokenize(test)
>>> test_tokenize
['this', 'sentence', 'is', 'just', 'a', 'tester', 'set', 'of', 'words']
>>> port = PorterStemmer()
>>> for word in test_tokenize:
... print port.stem(word)
...
thi
sentenc
is
just
a
tester
set
of
word
with open('Positive.txt', 'rb') as f:
for line in f.readlines():
words = nltk.word_tokenize(line)
for word in words:
print port.stem(word)
It seems that you are not calling the Stemmer API properly as it takes a single token at a time. That means you should tokenize your sentence first. Check out the docs here http://www.nltk.org/howto/stem.html
Also for future reference you should include full working code, with imports and the stack trace of your error.
with open('Positive.txt') as f:
for line in f:
tokens = format_sentence(line).split() # tokenize using spaces
stem_sentence = ' '.join([stemmer.stem(token) for token in tokens])
pos_data.append([stem_sentence, 'pos'])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.