I'm developing an algorithm to remove stopword. I am transforming a txt file into a list and thus passing in the algorithm for removal.
Example of file lines:
'mora vai nascer viver cair falar','positivo'
'deixa ver entendi vai crescer vai passar ve','positivo'
'so deveria ter foi agradeco de passei passei fez','positivo'
'nunca nao nao muito nao mais','negativo'
'a nao ate infelizmente ai ate quando','negativo'
'nao perto nao quanto menos nao sim nao nem simplesmente','negativo'
Code
with open('BasePalavras.txt') as arquivo:
baseTeste = [linha.strip() for linha in arquivo]
stopwords = ['a', 'agora', 'algum', 'alguma', 'aquele', 'aqueles', 'de', 'deu', 'do', 'e', 'estou', 'esta', 'esta',
'ir', 'meu', 'muito', 'mesmo', 'no', 'nossa', 'o', 'outro', 'para', 'que', 'sem', 'talvez', 'tem', 'tendo',
'tenha', 'teve', 'tive', 'todo', 'um', 'uma', 'umas', 'uns', 'vou']
def removestopword(texto):
frases=[]
for(palavras, emocao) in texto:
semstopwords = [p for p in palavras.splits() if p not in stopwords]
frases.append((semstopwords, emocao))
return frases
print (removestopword(baseTeste))
ERROR
Traceback (most recent call last):
File "C:/Users/Rivaldo/PycharmProjects/Mineracao/Principal.py", line 22, in <module>
print (removestopword(baseTeste))
File "C:/Users/Rivaldo/PycharmProjects/Mineracao/Principal.py", line 17, in removestopword
for(palavras, emocao) in texto:
ValueError: too many values to unpack
Try this:
with open('BasePalavras.txt') as arquivo:
baseTeste = [linha.strip().split(',') for linha in arquivo]
stopwords = ['a', 'agora', 'algum', 'alguma', 'aquele', 'aqueles', 'de', 'deu', 'do', 'e', 'estou', 'esta', 'esta',
'ir', 'meu', 'muito', 'mesmo', 'no', 'nossa', 'o', 'outro', 'para', 'que', 'sem', 'talvez', 'tem', 'tendo',
'tenha', 'teve', 'tive', 'todo', 'um', 'uma', 'umas', 'uns', 'vou']
def removestopword(texto):
frases=[]
for (palavras, emocao) in texto:
semstopwords = [p for p in palavras.split() if p not in stopwords]
frases.append((semstopwords, emocao))
return frases
print (removestopword(baseTeste))
Changed baseTeste = [linha.strip() for linha in arquivo]
to baseTeste = [linha.strip().split(',') for linha in arquivo]
and
semstopwords = [p for p in palavras.splits() if p not in stopwords]
to semstopwords = [p for p in palavras.split() if p not in stopwords]
.
Here's how I would do it.
stopwords = ['a', 'agora', 'algum', 'alguma', 'aquele', 'aqueles', 'de', 'deu', 'do', 'e', 'estou', 'esta', 'esta',
'ir', 'meu', 'muito', 'mesmo', 'no', 'nossa', 'o', 'outro', 'para', 'que', 'sem', 'talvez', 'tem', 'tendo',
'tenha', 'teve', 'tive', 'todo', 'um', 'uma', 'umas', 'uns', 'vou']
def remove_stopwords(text):
phrases = []
for (sentence, _) in text:
sentence_without_stopwords = [word for word in sentence.split() if word not in stopwords]
phrases.append(sentence_without_stopwords)
return phrases
with open('input.txt') as raw_text:
sentence_sentiments = []
lines = [line for line in raw_text]
for line in lines:
sentence, sentiment = line.split(',')
sentence_sentiments.append((sentence[1:-1], sentiment[1:-1]))
print(remove_stopwords(sentence_sentiments))
Notice how, in your provided code, baseTeste
is an array that contains a list of strings, representing the lines of your input file. This is not what you want, as you're attempting to loop ( for(palavras, emocao) in texto:
) over the (sentence, sentiment)
pairs inside these lines. You are thus missing the middle step of splitting each line into (sentence, sentiment)
pairs.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.