简体   繁体   English

如何手动将单词添加到nrc情感词典?

[英]How to add words manually to nrc sentiment lexicon?

I plan on using the nrc sentiment lexicon with twitter but I realize that there are many words missing. 我计划在Twitter上使用nrc情感词典,但我意识到缺少很多单词。 Can anybody guide me on how to add some words with their specific sentiment on R? 有人可以指导我如何用他们对R的特定情感添加一些单词吗? (I have downloaded the nrc to my environment and also have added the words and sentiments using rbind ). (我已经将nrc下载到我的环境中,并且还使用rbind添加了单词和情感)。

Now I don't know hoe to use the nrc lexicon I have modified. 现在我不知道要使用我修改过的nrc词典。 Help me please 请帮帮我

I have downloaded the nrc to my enviroment and also I have added the words and sentiments using r bind . 我已经将nrc下载到我的环境中,并且还使用r bind添加了单词和情感。 Now I don't know how to use the nrc lexicon I have modified. 现在我不知道如何使用我修改过的nrc词典。 Help me please 请帮帮我

The way that the NRC word-emotion association lexicon was built makes it a pretty good fit for social media data as it exists already, so I recommend taking a look at the details of where it comes from before making changes to it for your analysis. NRC单词情感协会词典的构建方式使其非常适合于现有的社交媒体数据,因此,我建议在对其进行更改以进行分析之前仔细研究一下其来源 However, if you decide that for your purposes, you need to add words to such a sentiment lexicon, the first step is to add the words to the dataset row-wise, via perhaps bind_rows() . 但是,如果您决定出于您的目的,则需要向此类情感词典中添加单词,第一步是通过bind_rows()将单词逐行添加到数据集中。 Let's say, perhaps, that you think "darcy" is a positive word and "wickham" is a negative word. 假设您认为“达西”是一个肯定的词,而“威克汉姆”是一个否定的词。

library(tidyverse)
library(tidytext)

nrc_lexicon <- get_sentiments("nrc")

custom_lexicon <- nrc_lexicon %>%
  bind_rows(tribble(~word, ~sentiment,
                    "darcy", "positive",
                    "wickham", "negative"))

Now, when you want to implement sentiment analysis, you can treat either one of these dataframes in the same way. 现在,当您要执行情感分析时,可以以相同的方式处理这些数据框之一。 If you have text data (say, the text of Pride and Prejudice ), you can first tidy it using unnest_tokens() and then implement sentiment analysis using an inner_join() . 如果您有文本数据(比如, 傲慢与偏见的文字),你可以先用整齐它unnest_tokens()然后使用工具情感分析inner_join()

tidy_PandP <- tibble(text = janeaustenr::prideprejudice) %>%
  unnest_tokens(word, text)

tidy_PandP %>%
  inner_join(nrc_lexicon)
#> Joining, by = "word"
#> # A tibble: 29,651 x 2
#>    word       sentiment
#>    <chr>      <chr>    
#>  1 pride      joy      
#>  2 pride      positive 
#>  3 prejudice  anger    
#>  4 prejudice  negative 
#>  5 truth      positive 
#>  6 truth      trust    
#>  7 possession anger    
#>  8 possession disgust  
#>  9 possession fear     
#> 10 possession negative 
#> # … with 29,641 more rows

tidy_PandP %>%
  inner_join(custom_lexicon)
#> Joining, by = "word"
#> # A tibble: 30,186 x 2
#>    word       sentiment
#>    <chr>      <chr>    
#>  1 pride      joy      
#>  2 pride      positive 
#>  3 prejudice  anger    
#>  4 prejudice  negative 
#>  5 truth      positive 
#>  6 truth      trust    
#>  7 possession anger    
#>  8 possession disgust  
#>  9 possession fear     
#> 10 possession negative 
#> # … with 30,176 more rows

Created on 2019-08-03 by the reprex package (v0.3.0) reprex软件包 (v0.3.0)创建于2019-08-03

Notice that you can implement the sentiment analysis for either lexicon (the original one or the one to which we added words) in the same way. 请注意,您可以以相同的方式对任一词典(原始词典或添加了单词的词典)执行情感分析。

I do want to note that the license for the NRC lexicon allows it to be used for research purposes for free, but for any commercial use, you must contact the NRC researchers and pay for a commercial license. 我确实要注意,NRC词典的许可证允许将其免费用于研究目的,但是对于任何商业用途,您必须联系NRC研究人员并支付商业许可证。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM