简体   繁体   English

使用nltk Sentiwordnet和python

[英]Usage of nltk Sentiwordnet with python

I am doing sentiment analysis on twitter data using python NLTK. 我正在使用python NLTK对twitter数据进行情绪分析。 I need a dictionary which contains +ve and -ve polarities of words. 我需要一个包含+ ve和-ve极性单词的字典。 I have read so much stuff regarding sentiwordnet but when I am using it for my project it is not giving efficient and fast results. 我已经阅读了很多关于sentiwordnet的内容,但是当我将它用于我的项目时,它并没有提供有效和快速的结果。 I think I'm not using it correctly. 我想我没有正确使用它。 Can anyone tell me correct way to use it? 谁能告诉我使用它的正确方法? Here are the steps I did up to now: 以下是我到目前为止所做的步骤:

  1. tokenization of tweets 推文的标记化
  2. POS tagging of tokens 标记的POS标记
  3. passing each tags to sentinet 将每个标签传递给sentinet

I am using the nltk package for tokenization and tagging. 我正在使用nltk包进行标记化和标记。 See a part of my code below: 请参阅下面的代码部分:

import nltk
from nltk.stem import *
from nltk.corpus import sentiwordnet as swn

tokens=nltk.word_tokenize(row) #for tokenization, row is line of a file in which tweets are saved.
tagged=nltk.pos_tag(tokens) #for POSTagging

for i in range(0,len(tagged)):
     if 'NN' in tagged[i][1] and len(swn.senti_synsets(tagged[i][0],'n'))>0:
            pscore+=(list(swn.senti_synsets(tagged[i][0],'n'))[0]).pos_score() #positive score of a word
            nscore+=(list(swn.senti_synsets(tagged[i][0],'n'))[0]).neg_score()  #negative score of a word
    elif 'VB' in tagged[i][1] and len(swn.senti_synsets(tagged[i][0],'v'))>0:
           pscore+=(list(swn.senti_synsets(tagged[i][0],'v'))[0]).pos_score()
           nscore+=(list(swn.senti_synsets(tagged[i][0],'v'))[0]).neg_score()
    elif 'JJ' in tagged[i][1] and len(swn.senti_synsets(tagged[i][0],'a'))>0:
           pscore+=(list(swn.senti_synsets(tagged[i][0],'a'))[0]).pos_score()
           nscore+=(list(swn.senti_synsets(tagged[i][0],'a'))[0]).neg_score()
    elif 'RB' in tagged[i][1] and len(swn.senti_synsets(tagged[i][0],'r'))>0:
           pscore+=(list(swn.senti_synsets(tagged[i][0],'r'))[0]).pos_score()
           nscore+=(list(swn.senti_synsets(tagged[i][0],'r'))[0]).neg_score()

At the end I will be calculating how many tweets are positive and how many tweets are negative. 最后,我将计算有多少推文是正面的,有多少推文是否定的。 Where am I wrong? 我哪里错了? How should I use it? 我该怎么用? And is there any other similar kind of dictionary which is easy to use? 还有其他类似的字典易于使用吗?

Yes, there are other lexicons that you can use. 是的,您可以使用其他词典。 You can find a small list of lexicons here: http://sentiment.christopherpotts.net/lexicons.html#resources It seems Bing Liu's Opinion Lexicon is quite easy to use. 你可以在这里找到一个小词典列表: http ://sentiment.christopherpotts.net/lexicons.html#resources看来Bing Liu的Opinion Lexicon很容易使用。

Apart from linking to those lexicons that website is a very nice tutorial on sentiment analysis. 除了链接那些词典,网站是一个非常好的情绪分析教程。

calculate the sentiment 计算情绪

alist = [all_tokens_in_doc]

totalScore = 0

count_words_included = 0

for word in all_words_in_comment:

    synset_forms = list(swn.senti_synsets(word[0], word[1]))

    if not synset_forms:

        continue

    synset = synset_forms[0] 

    totalScore = totalScore + synset.pos_score() - synset.neg_score()

    count_words_included = count_words_included +1

final_dec = ''

if count_words_included == 0:

    final_dec = 'N/A'

elif totalScore == 0:

    final_dec = 'Neu'        

elif totalScore/count_words_included < 0:

    final_dec = 'Neg'

elif totalScore/count_words_included > 0:

    final_dec = 'Pos'

return final_dec

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM