简体   繁体   English

sentiwordnet 使用 python 评分

[英]sentiwordnet scoring with python

I have been working on a research in relation with twitter sentiment analysis.我一直在从事与 Twitter 情绪分析相关的研究。 I have a little knowledge on how to code on Python.我对如何在 Python 上编码有一点了解。 Since my research is related with coding, I have done some research on how to analyze sentiment using Python, and the below is how far I have come to: 1.Tokenization of tweets 2. POS tagging of token and the remaining is calculating Positive and Negative of the sentiment which the issue i am facing now and need your help.由于我的研究与编码有关,因此我对如何使用 Python 分析情绪进行了一些研究,以下是我的研究成果: 1. 推文的标记化 2. 标记的 POS 标记,剩下的就是计算正数和否定我现在面临的问题并需要您的帮助的情绪。

Below is my code example:下面是我的代码示例:

import nltk
sentence = "Iphone6 camera is awesome for low light "
token = nltk.word_tokenize(sentence)
tagged = nltk.pos_tag(token)

Therefore, I want to ask if anybody can help me to show/guide the example of using python to code about sentiwordnet to calculate the positive and negative score of the tweeets that has already been POS tagged.因此,我想问是否有人可以帮助我展示/指导使用 python 编写关于 sentiwordnet 的示例来计算已经被 POS 标记的推文的正负分。 thank in advance预先感谢

It's a little unclear as to what exactly your question is.你的问题到底是什么有点不清楚。 Do you need a guide to using Sentiwordnet?您需要使用 Sentiwordnet 的指南吗? If so check out this link,如果是这样,请查看此链接,

http://www.nltk.org/howto/sentiwordnet.html http://www.nltk.org/howto/sentiwordnet.html

Since you've already tokenized and POS tagged the words, all you need to do now is to use this syntax,由于您已经对单词进行了标记和 POS 标记,因此您现在需要做的就是使用此语法,

swn.senti_synset('breakdown.n.03')

Breaking down the argument,打破争论,

  • 'breakdown' = word you need scores for. 'breakdown' = 你需要分数的单词。
  • 'n' = part of speech 'n' = 词性
  • '03' = Usage (01 for most common usage and a higher number would indicate lesser common usages) '03' = 用法(01 表示最常见的用法,数字越大表示不太常见的用法)

So for each tuple in your tagged array, create a string as above and pass it to the senti_synset function to get the positive, negative and objective score for that word.因此,对于标记数组中的每个元组,如上创建一个字符串并将其传递给 senti_synset 函数以获得该单词的正面、负面和客观分数。

Caveat: The POS tagger gives you a different tag than the one senti_synset accepts.警告:POS 标记器为您提供的标记与 senti_synset 接受的标记不同。 Use the following to convert to synset notation.使用以下内容转换为同义词集表示法。

n - NOUN 
v - VERB 
a - ADJECTIVE 
s - ADJECTIVE SATELLITE 
r - ADVERB 

(Credits to Using Sentiwordnet 3.0 for the above notation) (以上符号使用 Sentiwordnet 3.0的功劳)

That being said, it is generally not a great idea to use Sentiwordnet for Twitter sentiment analysis and here's why,话虽如此,使用 Sentiwordnet 进行 Twitter 情绪分析通常不是一个好主意,原因如下:

Tweets are filled with typos and non-dictionary words which Sentiwordnet often times does not recognize.推文中充满了 Sentiwordnet 经常无法识别的拼写错误和非字典词。 To counter this problem, either lemmatize/stem your tweets before you pos tag them or use a Machine Learning classifier such as Naive Bayes for which NLTK has built in functions.为了解决这个问题,要么在发布标签之前对推文进行词形还原/词干化,要么使用机器学习分类器,例如 NLTK 内置函数的朴素贝叶斯。 As for the training dataset for the classifier, either manually annotate a dataset or use a pre-labelled set such as, as the Sentiment140 corpus.至于分类器的训练数据集,要么手动注释数据集,要么使用预先标记的集合,例如 Sentiment140 语料库。

If you are uninterested in actually performing the sentiment analysis but need a sentiment tag for a given tweet, you can always use the Sentiment140 API for this purpose.如果您对实际执行情感分析不感兴趣,但需要给定推文的情感标签,您始终可以使用 Sentiment140 API 来实现此目的。

@Saravana Kumar has a wonderful answer. @Saravana Kumar 有一个绝妙的答案。

To add detailed code to it i am writing this.为了添加详细的代码,我正在写这个。 I have referred link https://nlpforhackers.io/sentiment-analysis-intro/我已经提到了链接https://nlpforhackers.io/sentiment-analysis-intro/

from nltk.corpus import wordnet as wn
from nltk.corpus import sentiwordnet as swn
from nltk.stem import PorterStemmer

def penn_to_wn(tag):
    """
    Convert between the PennTreebank tags to simple Wordnet tags
    """
    if tag.startswith('J'):
        return wn.ADJ
    elif tag.startswith('N'):
        return wn.NOUN
    elif tag.startswith('R'):
        return wn.ADV
    elif tag.startswith('V'):
        return wn.VERB
    return None

from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()

def get_sentiment(word,tag):
    """ returns list of pos neg and objective score. But returns empty list if not present in senti wordnet. """

    wn_tag = penn_to_wn(tag)
    if wn_tag not in (wn.NOUN, wn.ADJ, wn.ADV):
        return []

    lemma = lemmatizer.lemmatize(word, pos=wn_tag)
    if not lemma:
        return []

    synsets = wn.synsets(word, pos=wn_tag)
    if not synsets:
        return []

    # Take the first sense, the most common
    synset = synsets[0]
    swn_synset = swn.senti_synset(synset.name())

    return [swn_synset.pos_score(),swn_synset.neg_score(),swn_synset.obj_score()]


ps = PorterStemmer()
words_data = ['this','movie','is','wonderful']
# words_data = [ps.stem(x) for x in words_data] # if you want to further stem the word

pos_val = nltk.pos_tag(words_data)
senti_val = [get_sentiment(x,y) for (x,y) in pos_val]

print(f"pos_val is {pos_val}")
print(f"senti_val is {senti_val}")

Output输出

pos_val is [('this', 'DT'), ('movie', 'NN'), ('is', 'VBZ'), ('wonderful', 'JJ')]
senti_val is [[], [0.0, 0.0, 1.0], [], [0.75, 0.0, 0.25]]

Here is my solution:这是我的解决方案:

from nltk.corpus import sentiwordnet as swn
from nltk.corpus import wordnet
from nltk.tag import pos_tag
from nltk.stem import WordNetLemmatizer

def get_wordnet_pos(word):
    tag = nltk.pos_tag([word])[0][1][0].upper()
    tag_dict = {"J": wordnet.ADJ,
                "N": wordnet.NOUN,
                "V": wordnet.VERB,
                "R": wordnet.ADV}
    return tag_dict.get(tag, wordnet.NOUN)

def get_sentiment_score_of_review(sentence):
    # 1. Tokenize
    tokens = nltk.word_tokenize(sentence)

    lemmatizer = WordNetLemmatizer()

    sentiment_score = 0.0
    for word in tokens:
        tag = get_wordnet_pos(word)
        item_res = lemmatizer.lemmatize(word, tag)
        if not item_res:
            continue
        
        synsets = wn.synsets(item_res, pos=tag)
        if len(synsets) == 0:
            print("Nope!", word)
            continue
        
        # Take the first, the most common
        synset = synsets[0]
        swn_synset = swn.senti_synset(synset.name())
        sentiment_score += swn_synset.pos_score() - swn_synset.neg_score()
        
    return sentiment_score

For Positive and Negative sentiments, first you need to give training and have to train the model.对于正面和负面情绪,首先您需要进行训练,并且必须训练模型。 for training model you can use SVM, thiers open library called LibSVM you can use it.对于训练模型,您可以使用 SVM,您可以使用名为 LibSVM 的开放库。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM