简体   繁体   中英

How to add words manually to nrc sentiment lexicon?

I plan on using the nrc sentiment lexicon with twitter but I realize that there are many words missing. Can anybody guide me on how to add some words with their specific sentiment on R? (I have downloaded the nrc to my environment and also have added the words and sentiments using rbind ).

Now I don't know hoe to use the nrc lexicon I have modified. Help me please

I have downloaded the nrc to my enviroment and also I have added the words and sentiments using r bind . Now I don't know how to use the nrc lexicon I have modified. Help me please

The way that the NRC word-emotion association lexicon was built makes it a pretty good fit for social media data as it exists already, so I recommend taking a look at the details of where it comes from before making changes to it for your analysis. However, if you decide that for your purposes, you need to add words to such a sentiment lexicon, the first step is to add the words to the dataset row-wise, via perhaps bind_rows() . Let's say, perhaps, that you think "darcy" is a positive word and "wickham" is a negative word.

library(tidyverse)
library(tidytext)

nrc_lexicon <- get_sentiments("nrc")

custom_lexicon <- nrc_lexicon %>%
  bind_rows(tribble(~word, ~sentiment,
                    "darcy", "positive",
                    "wickham", "negative"))

Now, when you want to implement sentiment analysis, you can treat either one of these dataframes in the same way. If you have text data (say, the text of Pride and Prejudice ), you can first tidy it using unnest_tokens() and then implement sentiment analysis using an inner_join() .

tidy_PandP <- tibble(text = janeaustenr::prideprejudice) %>%
  unnest_tokens(word, text)

tidy_PandP %>%
  inner_join(nrc_lexicon)
#> Joining, by = "word"
#> # A tibble: 29,651 x 2
#>    word       sentiment
#>    <chr>      <chr>    
#>  1 pride      joy      
#>  2 pride      positive 
#>  3 prejudice  anger    
#>  4 prejudice  negative 
#>  5 truth      positive 
#>  6 truth      trust    
#>  7 possession anger    
#>  8 possession disgust  
#>  9 possession fear     
#> 10 possession negative 
#> # … with 29,641 more rows

tidy_PandP %>%
  inner_join(custom_lexicon)
#> Joining, by = "word"
#> # A tibble: 30,186 x 2
#>    word       sentiment
#>    <chr>      <chr>    
#>  1 pride      joy      
#>  2 pride      positive 
#>  3 prejudice  anger    
#>  4 prejudice  negative 
#>  5 truth      positive 
#>  6 truth      trust    
#>  7 possession anger    
#>  8 possession disgust  
#>  9 possession fear     
#> 10 possession negative 
#> # … with 30,176 more rows

Created on 2019-08-03 by the reprex package (v0.3.0)

Notice that you can implement the sentiment analysis for either lexicon (the original one or the one to which we added words) in the same way.

I do want to note that the license for the NRC lexicon allows it to be used for research purposes for free, but for any commercial use, you must contact the NRC researchers and pay for a commercial license.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM