[英]Delete words in sentiment lexicon in R
I am using the nrc, bing and afinn lexicons for sentiment analysis in R. 我正在使用nrc,bing和afinn词典对R中的情感进行分析。
Now I would like to remove some specific words form these lexicons, but I don't know how to do that, since the lexicons are not saved in my environment. 现在,我想从这些词典中删除一些特定的单词,但是我不知道该怎么做,因为这些词典未保存在我的环境中。
My code looks like this (for nrc as an example): 我的代码如下所示(以nrc为例):
MyTextFile %>% inner_join(get_sentiments("nrc")) %>% count(sentiment, sort = TRUE)
Here are two ways to do this (there are undoubtedly more). 这里有两种方法可以做到这一点(无疑还有更多)。 Note first that there are 13901 words in the
nrc
lexicon: 首先请注意,
nrc
词典中有13901个单词:
> library(tidytext)
> library(dplyr)
> sentiments <- get_sentiments("nrc")
> sentiments
# A tibble: 13,901 x 2
word sentiment
<chr> <chr>
1 abacus trust
2 abandon fear
3 abandon negative
4 abandon sadness
5 abandoned anger
6 abandoned fear
... and so on
You can filter out all words in a particular sentiment category (fewer words are left, at 12425): 您可以过滤出特定情感类别中的所有单词(剩下的单词较少,为12425):
> sentiments <- get_sentiments("nrc") %>% filter(sentiment!="fear")
> sentiments
# A tibble: 12,425 x 2
word sentiment
<chr> <chr>
1 abacus trust
2 abandon negative
3 abandon sadness
4 abandoned anger
5 abandoned negative
6 abandoned sadness
Or you can create your own list of dropwords
and remove them from the lexicon (fewer words are left, at 13884): 或者,您可以创建自己的
dropwords
列表并将其从词典中删除(剩下的单词较少,在13884):
> dropwords <- c("abandon","abandoned","abandonment","abduction","aberrant")
> sentiments <- get_sentiments("nrc") %>% filter(!word %in% dropwords)
> sentiments
# A tibble: 13,884 x 2
word sentiment
<chr> <chr>
1 abacus trust
2 abba positive
3 abbot trust
4 aberration disgust
5 aberration negative
6 abhor anger
Then you would just do the sentiment analysis using sentiments
you have created: 然后,你就只用做情感分析
sentiments
已创建:
> library(gutenbergr)
> hgwells <- gutenberg_download(35) # loads "The Time Machine"
> hgwells %>% unnest_tokens(word,text) %>%
inner_join(sentiments) %>% count(word,sort=TRUE)
Joining, by = "word"
# A tibble: 1,077 x 2
word n
<chr> <int>
1 white 236
2 feeling 200
3 time 200
4 sun 145
5 found 132
6 darkness 108
Hope this helps somewhat. 希望这会有所帮助。
If you can make a data frame of words you'd like to remove you can exclude these using an anti_join: 如果您可以将要删除的单词做成一个数据框,则可以使用anti_join排除这些单词:
word_list <- c("words","to","remove")
words_to_remove <- data.frame(words=word_list)
MyTextFile %>%
inner_join(get_sentiments("nrc")) %>%
anti_join(words_to_remove) %>%
count(sentiment, sort = TRUE)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.