Python VADER lexicon Structure for sentiment analysis

Question

I am using the VADER sentiment lexicon in Python's nltk library to analyze text sentiment. This lexicon does not suit my domain well, and so I wanted to add my own sentiment scores to various words. So, I got my hands on the lexicon text file (vader_lexicon.txt) to do just that. However, I do not understand the architecture of this file well. For example, a word like obliterate will have the following data in the text file: obliterate -2.9 0.83066 [-3, -4, -3, -3, -3, -3, -2, -1, -4, -3]

Clearly the -2.9 is the average of sentiment scores in the list. But what does the 0.83066 represent?

Thanks!

Answer 1

According to the VADER source code , only the first number on each line is used. The rest of the line is ignored:

for line in self.lexicon_full_filepath.split('\n'):
    (word, measure) = line.strip().split('\t')[0:2] # Here!
    lex_dict[word] = float(measure)

Answer 2

The vader_lexicon.txt file has four tab delimited columns as you said.

Column 1: The Token
Column 2: It is the Mean of the human Sentiment ratings
Column 3: It is the Standard Deviation of the token assuming it follows Normal Distribution
Column 4: It is the list of 10 human ratings taken during experiments

The actual code or sentiment calculation does not use the 3rd and 4th columns. So if you want to update the lexicon according to your requirement you can leave the last two columns blank or fill in with a random number and a list.

Python VADER lexicon Structure for sentiment analysis

Question

2 answers

solution1
4 ACCPTED 2018-06-16 22:58:02

solution2
2 2019-01-10 06:06:31

Python VADER lexicon Structure for sentiment analysis

Question

2 answers

solution1 4 ACCPTED 2018-06-16 22:58:02

solution2 2 2019-01-10 06:06:31

solution1
4 ACCPTED 2018-06-16 22:58:02

solution2
2 2019-01-10 06:06:31