I am trying to reproduce figure 4.3 found in section 4.1.3 of the book "Text mining with R". sentiment analysis
This section is trying to group all bigrams by four key negation words, "not", "no", "never" and "without", and for each group it will plot the sentiment contribution (only by the word following the negation words, which means the wrong contribution) towards the book.
So I will be plotting the words as y-axis and contribution as x-axis, and for the plots to look nice, I also want the bars to be arranged in descending order for each group. So similarly to the previous sections, I reorder the level of the words using value of the contribution.
But the problem here is that under each group, these words will have different contribution. For example in group 1, "happy" appears more than "hope" so it has higher contribution, but in group 2, it is the opposite way. And worse, I can't do mutate(word2 = reorder(word2, contribution))
when the data frame has been group_by(word1)
.
The book is able to produce the plot nicely as it should be so I suppose there is some way to reorder level according to the different groups.
Below are the codes, anything before #preparing the data for plotting
are taken from the book so shouldn't have any issue, from there on the codes are by me.
library(dplyr)
library(tidytext)
library(janeaustenr)
library(tidyr)
#getting bigrams
austen_bigrams <- austen_books() %>%
unnest_tokens(bigram, text, token = "ngrams", n = 2)
bigrams_separated <- austen_bigrams %>%
separate(bigram, c("word1", "word2"), sep = " ")
#four negation words to look at
negation_words <- c("not", "no", "never", "without")
AFINN <- get_sentiments("afinn")
#get the sentiment score of words preceded by the four negation words
negated_words <- bigrams_separated %>%
filter(word1 %in% negation_words) %>% #word1 as negation words
inner_join(AFINN, by = c(word2 = "word")) %>% #word2 as the word following negation words
count(word1, word2, score, sort = TRUE) %>%
ungroup()
#preparing the data for plotting
bigrams_plot <- bigrams_separated %>%
filter(word1 %in% negation_words) %>%
inner_join(AFINN, by = c(word2 = "word")) %>% #getting sentiment score
count(word1, word2, score, sort = TRUE) %>%
mutate(contribution = n * score) %>% #defining contribution as n*score
group_by(word1) %>% #group by negation words
top_n(12,abs(contribution)) %>%
arrange(desc(abs(contribution))) %>%
ungroup() %>%
mutate(word2 = reorder(word2, contribution))
#plotting sentiment score contribution grouped by the four negation words
ggplot(bigrams_plot, aes(word2, n * score, fill = n * score > 0)) +
geom_col(show.legend = FALSE) +
facet_wrap(~word1, ncol = 2, scales = "free") +
coord_flip()
I have created a simpler version below:
v1_grp <- c(rep('A',10),rep('B',10))
v2_Aterm <- sample(letters[1:10],10,replace=F)
v2_Bterm <- sample(letters[1:10],10,replace=F)
v3_score <- sample(-10:10,20,replace=T)
data1 <- data_frame(grp=v1_grp,term=c(v2_Aterm,v2_Bterm),score=v3_score)
dataplot <- data1 %>%
arrange(desc(score)) %>%
mutate(term=reorder(term,score))
ggplot(dataplot, aes(term,score,fill=score>0)) +
geom_col(show.legend = FALSE) +
facet_wrap(~grp, ncol = 2, scales = "free") +
coord_flip()
(Adapted from https://drsimonj.svbtle.com/ordering-categories-within-ggplot2-facets )
dataplot <- data1 %>%
arrange(grp, score) %>%
mutate(order = row_number())
ggplot(dataplot, aes(order,score,fill=score>0)) +
geom_col(show.legend = FALSE) +
facet_wrap(~grp, ncol = 2, scales = "free") +
coord_flip() +
scale_x_continuous(
breaks = dataplot$order,
labels = dataplot$term,
expand = c(0,0)
)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.