在R中删除情感词典中的单词

Question

I am using the nrc, bing and afinn lexicons for sentiment analysis in R. 我正在使用nrc，bing和afinn词典对R中的情感进行分析。

Now I would like to remove some specific words form these lexicons, but I don't know how to do that, since the lexicons are not saved in my environment. 现在，我想从这些词典中删除一些特定的单词，但是我不知道该怎么做，因为这些词典未保存在我的环境中。

My code looks like this (for nrc as an example): 我的代码如下所示（以nrc为例）：

 MyTextFile %>% inner_join(get_sentiments("nrc")) %>% count(sentiment, sort = TRUE)

Answer 1

Here are two ways to do this (there are undoubtedly more). 这里有两种方法可以做到这一点（无疑还有更多）。 Note first that there are 13901 words in the nrc lexicon: 首先请注意， nrc词典中有13901个单词：

> library(tidytext)
> library(dplyr)
> sentiments <- get_sentiments("nrc")
> sentiments
# A tibble: 13,901 x 2
   word        sentiment
   <chr>       <chr>    
 1 abacus      trust    
 2 abandon     fear     
 3 abandon     negative 
 4 abandon     sadness 
 5 abandoned   anger    
 6 abandoned   fear    
... and so on

You can filter out all words in a particular sentiment category (fewer words are left, at 12425): 您可以过滤出特定情感类别中的所有单词（剩下的单词较少，为12425）：

> sentiments <- get_sentiments("nrc") %>% filter(sentiment!="fear")
> sentiments
# A tibble: 12,425 x 2 
   word        sentiment
   <chr>       <chr>    
 1 abacus      trust    
 2 abandon     negative 
 3 abandon     sadness  
 4 abandoned   anger    
 5 abandoned   negative 
 6 abandoned   sadness

Or you can create your own list of dropwords and remove them from the lexicon (fewer words are left, at 13884): 或者，您可以创建自己的dropwords列表并将其从词典中删除（剩下的单词较少，在13884）：

> dropwords <- c("abandon","abandoned","abandonment","abduction","aberrant")
> sentiments <- get_sentiments("nrc") %>% filter(!word %in% dropwords)
> sentiments
# A tibble: 13,884 x 2
   word       sentiment
   <chr>      <chr>    
 1 abacus     trust    
 2 abba       positive 
 3 abbot      trust    
 4 aberration disgust  
 5 aberration negative 
 6 abhor      anger

Then you would just do the sentiment analysis using sentiments you have created: 然后，你就只用做情感分析sentiments已创建：

> library(gutenbergr)
> hgwells <- gutenberg_download(35) # loads "The Time Machine"
> hgwells %>% unnest_tokens(word,text) %>% 
      inner_join(sentiments) %>% count(word,sort=TRUE)
Joining, by = "word"
# A tibble: 1,077 x 2
   word         n
   <chr>    <int>
 1 white      236
 2 feeling    200
 3 time       200
 4 sun        145
 5 found      132
 6 darkness   108

Hope this helps somewhat. 希望这会有所帮助。

Answer 2

If you can make a data frame of words you'd like to remove you can exclude these using an anti_join: 如果您可以将要删除的单词做成一个数据框，则可以使用anti_join排除这些单词：

word_list <- c("words","to","remove")
words_to_remove <- data.frame(words=word_list)

MyTextFile %>%
  inner_join(get_sentiments("nrc")) %>%
  anti_join(words_to_remove) %>%
  count(sentiment, sort = TRUE)

在R中删除情感词典中的单词

问题描述

2 个解决方案

解决方案1
0 已采纳 2018-04-14 03:11:34

解决方案2
0 2018-04-14 03:14:37

在R中删除情感词典中的单词

问题描述

2 个解决方案

解决方案1 0 已采纳 2018-04-14 03:11:34

解决方案2 0 2018-04-14 03:14:37

解决方案1
0 已采纳 2018-04-14 03:11:34

解决方案2
0 2018-04-14 03:14:37