简体   繁体   English

用于情感分析的Python VADER词典结构

[英]Python VADER lexicon Structure for sentiment analysis

I am using the VADER sentiment lexicon in Python's nltk library to analyze text sentiment. 我正在使用Python的nltk库中的VADER情感词典来分析文本情感。 This lexicon does not suit my domain well, and so I wanted to add my own sentiment scores to various words. 该词典不太适合我的领域,因此我想在各种单词上添加自己的情感得分。 So, I got my hands on the lexicon text file (vader_lexicon.txt) to do just that. 因此,我动用了词典文本文件(vader_lexicon.txt)来完成此操作。 However, I do not understand the architecture of this file well. 但是,我不太了解此文件的体系结构。 For example, a word like obliterate will have the following data in the text file: obliterate -2.9 0.83066 [-3, -4, -3, -3, -3, -3, -2, -1, -4, -3] 例如,类似obliterate的单词在文本文件中将具有以下数据:obliterate -2.9 0.83066 [-3,-4,-3,-3,-3,-3,-2,-1,-4,- 3]

Clearly the -2.9 is the average of sentiment scores in the list. 显然,-2.9是列表中平均情绪得分。 But what does the 0.83066 represent? 但是0.83066代表什么呢?

Thanks! 谢谢!

According to the VADER source code , only the first number on each line is used. 根据VADER源代码 ,仅使用每行的第一个数字。 The rest of the line is ignored: 该行的其余部分将被忽略:

for line in self.lexicon_full_filepath.split('\n'):
    (word, measure) = line.strip().split('\t')[0:2] # Here!
    lex_dict[word] = float(measure)

The vader_lexicon.txt file has four tab delimited columns as you said. 如您所说,vader_lexicon.txt文件具有四个制表符分隔的列。

  1. Column 1: The Token 第1栏:令牌
  2. Column 2: It is the Mean of the human Sentiment ratings 第2列:这是人类情感等级的平均值
  3. Column 3: It is the Standard Deviation of the token assuming it follows Normal Distribution 第3列:假设令牌遵循正态分布,这是令牌的标准偏差
  4. Column 4: It is the list of 10 human ratings taken during experiments 第4栏:这是实验过程中获得的10个人类评分的列表

The actual code or sentiment calculation does not use the 3rd and 4th columns. 实际的代码或情感计算不使用第三和第四列。 So if you want to update the lexicon according to your requirement you can leave the last two columns blank or fill in with a random number and a list. 因此,如果要根据需要更新词典,可以将最后两列保留为空白,或用随机数和列表填充。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM