![](/img/trans.png)
[英]What data structure to use to store the sentiment count of corresponding word during sentiment analysis in python?
[英]How do you count a negative or positive word prior to a specific word - Sentiment Analysis in Python?
我正在嘗試計算列表中的否定詞在特定詞之前出現的次數。 例如,“這台糟糕的筆記本電腦”。 指定的單詞是“ laptop”,我希望輸出在Python中具有“ Terrible 1”。
def run(path):
negWords={} #dictionary to return the count
#load the negative lexicon
negLex=loadLexicon('negative-words.txt')
fin=open(path)
for line in fin: #for every line in the file (1 review per line)
line=line.lower().strip().split(' ')
review_set=set() #Adding all the words in the review to a set
for word in line: #Check if the word is present in the line
review_set.add(word) #As it is a set, only adds one time
for word in review_set:
if word in negLex:
if word in negWords:
negWords[word]=negWords[word]+1
else:
negWords[word] = 1
fin.close()
return negWords
if __name__ == "__main__":
print(run('textfile'))
看來您想針對一個連續的單詞檢查一個函數,這是一種方法, condition
將針對每個連續的單詞進行檢查。
text = 'Do you like bananas? Not only do I like bananas, I love bananas!'
trigger_words = {'bananas'}
positive_words = {'like', 'love'}
def condition(w):
return w[0] in positive_words and w[1] in trigger_words
for c in '.,?!':
text = text.replace(c, '')
words = text.lower().split()
matches = filter(condition, zip(words, words[1:]))
n_positives = 0
for w1, w2 in matches:
print(f'{w1.upper()} {w2} => That\'s positive !')
n_positives += 1
print(f'This text had a score of {n_positives}')
輸出:
LIKE bananas => That's positive !
LIKE bananas => That's positive !
LOVE bananas => That's positive !
3
您只需將zip(w, w[1:])
更改為zip(w, w[1:], w[2:])
即可搜索3個連續單詞,條件是檢查3個單詞。
您可以通過執行以下操作獲得反詞典:
from collections import Counter
counter = Counter((i[0] for i in matches)) # counter = {'like': 2, 'love': 1}
這應該可以滿足您的需求,它使用set
&相交避免了某些循環。 這些步驟是-
請注意,這只會識別出首行中出現否定詞,因此“可怕的筆記本電腦”將不匹配。
from collections import defaultdict
def run(path):
negWords=defaultdict(int) # A defaultdict(int) will start at 0, can just add.
#load the negative lexicon
negLex=loadLexicon('negative-words.txt')
# ?? Is the above a list or a set, if it's a list convert to set
negLex = set(negLex)
fin=open(path)
for line in fin: #for every line in the file (1 review per line)
line=line.lower().strip().split(' ')
# Can just pass a list to set to make a set of it's items.
review_set = set(line)
# Compare the review set against the neglex set. We want words that are in
# *both* sets, so we can use intersection.
neg_words_used = review_set & negLex
# Is the bad word followed by the word laptop?
for word in neg_words_used:
# Find the word in the line list
ix = line.index(word)
if ix > len(line) - 2:
# Can't have laptop after it, it's the last word.
continue
# The word after this index in the line is laptop.
if line[ix+1] == 'laptop':
negWords[word] += 1
fin.close()
return negWords
如果您只對單詞“ laptop”之前的單詞感興趣,那么一種更明智的方法是查找單詞“ laptop”,然后在此之前檢查該單詞是否為負數。 下面的示例可以做到這一點。
這避免了查找與筆記本電腦無關的單詞。
from collections import defaultdict
def run(path):
negWords=defaultdict(int) # A defaultdict(int) will start at 0, can just add.
#load the negative lexicon
negLex=loadLexicon('negative-words.txt')
# ?? Is the above a list or a set, if it's a list convert to set
negLex = set(negLex)
fin=open(path)
for line in fin: #for every line in the file (1 review per line)
line=line.lower().strip().split(' ')
try:
ix = line.index('laptop')
except ValueError:
# If we dont' find laptop, continue to next line.
continue
if ix == 0:
# Laptop is the first word of the line, can't check prior word.
continue
previous_word = line[ix-1]
if previous_word in negLex:
# Negative word before the current one.
negWords[previous_word] += 1
fin.close()
return negWords
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.