[英]Delete words in sentiment lexicon in R
我正在使用nrc,bing和afinn詞典對R中的情感進行分析。
現在,我想從這些詞典中刪除一些特定的單詞,但是我不知道該怎么做,因為這些詞典未保存在我的環境中。
我的代碼如下所示(以nrc為例):
MyTextFile %>% inner_join(get_sentiments("nrc")) %>% count(sentiment, sort = TRUE)
這里有兩種方法可以做到這一點(無疑還有更多)。 首先請注意, nrc
詞典中有13901個單詞:
> library(tidytext)
> library(dplyr)
> sentiments <- get_sentiments("nrc")
> sentiments
# A tibble: 13,901 x 2
word sentiment
<chr> <chr>
1 abacus trust
2 abandon fear
3 abandon negative
4 abandon sadness
5 abandoned anger
6 abandoned fear
... and so on
您可以過濾出特定情感類別中的所有單詞(剩下的單詞較少,為12425):
> sentiments <- get_sentiments("nrc") %>% filter(sentiment!="fear")
> sentiments
# A tibble: 12,425 x 2
word sentiment
<chr> <chr>
1 abacus trust
2 abandon negative
3 abandon sadness
4 abandoned anger
5 abandoned negative
6 abandoned sadness
或者,您可以創建自己的dropwords
列表並將其從詞典中刪除(剩下的單詞較少,在13884):
> dropwords <- c("abandon","abandoned","abandonment","abduction","aberrant")
> sentiments <- get_sentiments("nrc") %>% filter(!word %in% dropwords)
> sentiments
# A tibble: 13,884 x 2
word sentiment
<chr> <chr>
1 abacus trust
2 abba positive
3 abbot trust
4 aberration disgust
5 aberration negative
6 abhor anger
然后,你就只用做情感分析sentiments
已創建:
> library(gutenbergr)
> hgwells <- gutenberg_download(35) # loads "The Time Machine"
> hgwells %>% unnest_tokens(word,text) %>%
inner_join(sentiments) %>% count(word,sort=TRUE)
Joining, by = "word"
# A tibble: 1,077 x 2
word n
<chr> <int>
1 white 236
2 feeling 200
3 time 200
4 sun 145
5 found 132
6 darkness 108
希望這會有所幫助。
如果您可以將要刪除的單詞做成一個數據框,則可以使用anti_join排除這些單詞:
word_list <- c("words","to","remove")
words_to_remove <- data.frame(words=word_list)
MyTextFile %>%
inner_join(get_sentiments("nrc")) %>%
anti_join(words_to_remove) %>%
count(sentiment, sort = TRUE)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.