简体   繁体   中英

R sentiment analysis; 'lexicon' not found; 'sentiments' corrupted?

I am trying to follow this on-line tutorial on sentiment analysis. The code:

new_sentiments <- sentiments %>% #From the tidytext package
  filter(lexicon != "loughran") %>% #Remove the finance lexicon
  mutate( sentiment = ifelse(lexicon == "AFINN" & score >= 0, "positive",
                         ifelse(lexicon == "AFINN" & score < 0,
                                "negative", sentiment))) %>%
  group_by(lexicon) %>%
  mutate(words_in_lexicon = n_distinct(word)) %>%
  ungroup()

Generates the error:

>Error in filter_impl(.data, quo) : 
>Evaluation error: object 'lexicon' not found.

Related, perhaps is that to me it appears the "sentiments" tables are acting strangely (corrupted?). Here is a head of 'sentiments':

> head(sentiments,3)
>  element_id sentence_id word_count sentiment                                  
> chapter
> 1          1           1          7         0 The First Book of Moses:  
> Called Genesis
> 2          2           1         NA         0 The First Book of Moses:  
> Called Genesis
> 3          3           1         NA         0 The First Book of Moses:  > 
> Called Genesis
>                                  category
> 1 The First Book of Moses:  Called Genesis
> 2 The First Book of Moses:  Called Genesis
> 3 The First Book of Moses:  Called Genesis

If I use Get_Sentiments for bing, AFINN or NRC, though, I get what looks like an appropriate reponse:

>  get_sentiments("bing")
> # A tibble: 6,788 x 2
>   word        sentiment
>   <chr>       <chr>    >   1 2-faced     negative 
> 2 2-faces     negative 
> 3 a+          positive 
> 4 abnormal    negative 

I tried removing (remove.packages) and re-installing tidytext; no change in behavior. I am running R 3.5

Even if I am completely misunderstanding the problem, I would appreciate any insights anyone can give me.

The following instructions will fix the new_sentiments dataset as shown in the Data Camp tutorial .

bing <- get_sentiments("bing") %>% 
     mutate(lexicon = "bing", 
            words_in_lexicon = n_distinct(word))    

nrc <- get_sentiments("nrc") %>% 
     mutate(lexicon = "nrc", 
            words_in_lexicon = n_distinct(word))

afinn <- get_sentiments("afinn") %>% 
     mutate(lexicon = "afinn", 
            words_in_lexicon = n_distinct(word))

new_sentiments <- bind_rows(bing, nrc, afinn)
names(new_sentiments)[names(new_sentiments) == 'value'] <- 'score'
new_sentiments %>% 
     group_by(lexicon, sentiment, words_in_lexicon) %>% 
     summarise(distinct_words = n_distinct(word)) %>% 
     ungroup() %>% 
     spread(sentiment, distinct_words) %>% 
     mutate(lexicon = color_tile("lightblue", "lightblue")(lexicon), 
            words_in_lexicon = color_bar("lightpink")(words_in_lexicon)) %>% 
     my_kable_styling(caption = "Word Counts per Lexicon")

The subsequent graphs will work too!

It appears tidytext had to be changed, which broke some of the code in the tutorial.

To make the code run, replace

new_sentiments <- sentiments %>% #From the tidytext package
  filter(lexicon != "loughran") %>% #Remove the finance lexicon
  mutate( sentiment = ifelse(lexicon == "AFINN" & score >= 0, "positive",
                              ifelse(lexicon == "AFINN" & score < 0,
                                     "negative", sentiment))) %>%
  group_by(lexicon) %>%
  mutate(words_in_lexicon = n_distinct(word)) %>%
  ungroup()

with

new_sentiments <- get_sentiments("afinn")
names(new_sentiments)[names(new_sentiments) == 'value'] <- 'score'
new_sentiments <- new_sentiments %>% mutate(lexicon = "afinn", sentiment = ifelse(score >= 0, "positive", "negative"),
                                                     words_in_lexicon = n_distinct((word)))

The next few graphs won't make as much sense (since we now only use one lexicon), but the rest of the tutorial will work

UPDATE here 's an excellent explanation from the tidytext package author as to what happened.

I found a similar problem, I try this code below, I hope it would help

library(tm)
library(tidyr)
library(ggthemes)
library(ggplot2)
library(dplyr)
library(tidytext)
library(textdata)

# Choose the bing lexicon
get_sentiments("bing")
get_sentiments("afinn")
get_sentiments("nrc")

#define new
afinn=get_sentiments("afinn")
bing=get_sentiments("bing")
nrc=get_sentiments("nrc")

#check
head(afinn)
head(bing)
head(nrc)
head(sentiments) #from tidytext packages

#merging dataframe
merge_sentiments=rbind(sentiments,get_sentiments('bing'),get_sentiments('nrc'))
head(merge_sentiments) #check

merge2_sentiments=merge(merge_sentiments,afinn,by=1,all=T)
head(merge2_sentiments) #check

#make new data frame with column lexicon added
new_sentiments <- merge2_sentiments
new_sentiments <- new_sentiments %>% 
  mutate(lexicon=ifelse(sentiment=='positive','bing',ifelse(sentiment=='negative','bing',ifelse(sentiment=='NA','afinn','nrc'))))

colnames(new_sentiments)[colnames(new_sentiments)=='value']='score'

#check
head(new_sentiments)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM