简体   繁体   中英

Is there a way to reorder the level of a variable after grouping using group_by?

I am trying to reproduce figure 4.3 found in section 4.1.3 of the book "Text mining with R". sentiment analysis

在此处输入图片说明

This section is trying to group all bigrams by four key negation words, "not", "no", "never" and "without", and for each group it will plot the sentiment contribution (only by the word following the negation words, which means the wrong contribution) towards the book.

So I will be plotting the words as y-axis and contribution as x-axis, and for the plots to look nice, I also want the bars to be arranged in descending order for each group. So similarly to the previous sections, I reorder the level of the words using value of the contribution.

But the problem here is that under each group, these words will have different contribution. For example in group 1, "happy" appears more than "hope" so it has higher contribution, but in group 2, it is the opposite way. And worse, I can't do mutate(word2 = reorder(word2, contribution)) when the data frame has been group_by(word1) .

The book is able to produce the plot nicely as it should be so I suppose there is some way to reorder level according to the different groups.

Below are the codes, anything before #preparing the data for plotting are taken from the book so shouldn't have any issue, from there on the codes are by me.

library(dplyr)
library(tidytext)
library(janeaustenr)
library(tidyr)

#getting bigrams

austen_bigrams <- austen_books() %>%
  unnest_tokens(bigram, text, token = "ngrams", n = 2)  
bigrams_separated <- austen_bigrams %>%
  separate(bigram, c("word1", "word2"), sep = " ")  

#four negation words to look at

negation_words <- c("not", "no", "never", "without")
AFINN <- get_sentiments("afinn")

#get the sentiment score of words preceded by the four negation words

negated_words <- bigrams_separated %>%
  filter(word1 %in% negation_words) %>%  #word1 as negation words
  inner_join(AFINN, by = c(word2 = "word")) %>%  #word2 as the word following negation words
  count(word1, word2, score, sort = TRUE) %>%
  ungroup()

#preparing the data for plotting

bigrams_plot <- bigrams_separated %>%
  filter(word1 %in% negation_words) %>% 
  inner_join(AFINN, by = c(word2 = "word")) %>%  #getting sentiment score
  count(word1, word2, score, sort = TRUE) %>%
  mutate(contribution = n * score) %>%  #defining contribution as n*score
  group_by(word1) %>%  #group by negation words
  top_n(12,abs(contribution)) %>%
  arrange(desc(abs(contribution))) %>%
  ungroup() %>%
  mutate(word2 = reorder(word2, contribution)) 

#plotting sentiment score contribution grouped by the four negation words

ggplot(bigrams_plot, aes(word2, n * score, fill = n * score > 0)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~word1, ncol = 2, scales = "free") +
  coord_flip()

I have created a simpler version below:

v1_grp <- c(rep('A',10),rep('B',10))
v2_Aterm <- sample(letters[1:10],10,replace=F)
v2_Bterm <- sample(letters[1:10],10,replace=F)
v3_score <- sample(-10:10,20,replace=T)

data1 <- data_frame(grp=v1_grp,term=c(v2_Aterm,v2_Bterm),score=v3_score)

dataplot <- data1 %>%
  arrange(desc(score)) %>%
  mutate(term=reorder(term,score)) 

ggplot(dataplot, aes(term,score,fill=score>0)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~grp, ncol = 2, scales = "free") +
  coord_flip()

(Adapted from https://drsimonj.svbtle.com/ordering-categories-within-ggplot2-facets )

dataplot <- data1 %>%
  arrange(grp, score) %>%
  mutate(order = row_number())

ggplot(dataplot, aes(order,score,fill=score>0)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~grp, ncol = 2, scales = "free") +
  coord_flip() +
  scale_x_continuous(
    breaks = dataplot$order,
    labels = dataplot$term,
    expand = c(0,0)
  )

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM