How to use my own lexicon dictionary to analyse sentences in R?

Question

I have formed a new lexicon dictionary to analyse the sentiment of sentences in R. I have used lexicon dictionaries before using R, but I unsure how to use my own. I managed to create positive and negative list of words, which counts the number of positive and negative words, then providing a sum. This does not take into account the scores allocated to each word as shown in the example below.

I would like to analyse say this sentence "I am happy and kind of sad". Example list of words and scores (list would be bigger than this):

happy, 1.3455
sad, -1.0552

I would like to match these words with the sentence and take the sum of the scores, 1.3455 + -1.0552, which in this case gives an overall score of 0.2903.

How would I go about in taking the actual score for each word to provide an overall score when analysing the sentiment of each sentence in R as emphasised in the example above?

Many thanks, James

Answer 1

You can start with the magnificent tidytext package:

library(tidytext)
library(tidyverse)

First, your data to analyze, and a small transformation:

# data
df <-data_frame(text = c('I am happy and kind of sad','sad is sad, happy is good'))

# add and ID
df <- tibble::rowid_to_column(df, "ID")

# add the name of the ID column
colnames(df)[1] <- "line"

> df
# A tibble: 1 x 2
   line text                      
  <int> <chr>                     
1     1 I am happy and kind of sad

Then you could work them to make words in column. This is a "loop" that is applied to each sentence (each id):

 tidy <- df %>% unnest_tokens(word, text)
    > tidy
# A tibble: 7 x 2
   line word 
  <int> <chr>
1     1 i    
2     1 am   
3     1 happy
4     1 and  
5     1 kind 
6     1 of   
7     1 sad

Now your brand new lexicon:

lexicon <- data_frame(word =c('happy','sad'),scores=c(1.3455,-1.0552))
> lexicon
# A tibble: 2 x 2
  word  scores
  <chr>  <dbl>
1 happy   1.35
2 sad    -1.06

Lastly, you can merge lexicon and data to have the sum of the scores.

merged <- merge(tidy,lexicon, by = 'word')

Now for each phrase, the sentiment:

scoredf <- aggregate(cbind(scores) ~line, data = merged, sum)
>scoredf
  line  scores
1    1  0.2903
2    2 -0.7649

Lastly you can merge the initial df with the scores, to have phrases and scores together:

scoredf <- aggregate(cbind(scores) ~line, data = merged, sum)
merge(df,scoredf, by ='line')
  line                       text  scores
1    1 I am happy and kind of sad  0.2903
2    2  sad is sad, happy is good -0.7649

In case you want for multiple phrases the overall sentiment scores.
Hope it helps!

How to use my own lexicon dictionary to analyse sentences in R?

Question

1 answers

solution1
2 ACCPTED 2018-07-13 07:33:12

How to use my own lexicon dictionary to analyse sentences in R?

Question

1 answers

solution1 2 ACCPTED 2018-07-13 07:33:12

solution1
2 ACCPTED 2018-07-13 07:33:12