简体   繁体   中英

How to use my own lexicon dictionary to analyse sentences in R?

I have formed a new lexicon dictionary to analyse the sentiment of sentences in R. I have used lexicon dictionaries before using R, but I unsure how to use my own. I managed to create positive and negative list of words, which counts the number of positive and negative words, then providing a sum. This does not take into account the scores allocated to each word as shown in the example below.

I would like to analyse say this sentence "I am happy and kind of sad". Example list of words and scores (list would be bigger than this):

happy, 1.3455
sad, -1.0552

I would like to match these words with the sentence and take the sum of the scores, 1.3455 + -1.0552, which in this case gives an overall score of 0.2903.

How would I go about in taking the actual score for each word to provide an overall score when analysing the sentiment of each sentence in R as emphasised in the example above?

Many thanks, James

You can start with the magnificent tidytext package:

library(tidytext)
library(tidyverse)

First, your data to analyze, and a small transformation:

# data
df <-data_frame(text = c('I am happy and kind of sad','sad is sad, happy is good'))

# add and ID
df <- tibble::rowid_to_column(df, "ID")

# add the name of the ID column
colnames(df)[1] <- "line"

> df
# A tibble: 1 x 2
   line text                      
  <int> <chr>                     
1     1 I am happy and kind of sad

Then you could work them to make words in column. This is a "loop" that is applied to each sentence (each id):

 tidy <- df %>% unnest_tokens(word, text)
    > tidy
# A tibble: 7 x 2
   line word 
  <int> <chr>
1     1 i    
2     1 am   
3     1 happy
4     1 and  
5     1 kind 
6     1 of   
7     1 sad  

Now your brand new lexicon:

lexicon <- data_frame(word =c('happy','sad'),scores=c(1.3455,-1.0552))
> lexicon
# A tibble: 2 x 2
  word  scores
  <chr>  <dbl>
1 happy   1.35
2 sad    -1.06

Lastly, you can merge lexicon and data to have the sum of the scores.

merged <- merge(tidy,lexicon, by = 'word')    


Now for each phrase, the sentiment:

scoredf <- aggregate(cbind(scores) ~line, data = merged, sum)
>scoredf
  line  scores
1    1  0.2903
2    2 -0.7649


Lastly you can merge the initial df with the scores, to have phrases and scores together:

scoredf <- aggregate(cbind(scores) ~line, data = merged, sum)
merge(df,scoredf, by ='line')
  line                       text  scores
1    1 I am happy and kind of sad  0.2903
2    2  sad is sad, happy is good -0.7649

In case you want for multiple phrases the overall sentiment scores.
Hope it helps!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM