Scatter plot or dispersion plot in R

Question

Okay, so I want to have a single plot where I would have a "x" number of novels and we would be able to see the dispersion of a particular word throughout all novels. Every novel has a different length (number of total words), so the "x" axis would have to be the novels and the "y" axis would have to be the length of each novel. Right now, I am able to create a separate plot for every novel, but I want to have all of them together. Here's what I have so far:

input.dir<-("corpus2")
files.v<-dir(input.dir, "\\.txt$")

corpus<-corpus(files.v, input.dir)

tiempo<-tiempo(corpus)

noche<-palabra("día", corpus, tiempo)

dispersion(noche)

#corpus

corpus<-function(files.v, input.dir){
  text.word.vector.l<-list()
  for(i in 1:length(files.v)){
    text.v <- scan(paste(input.dir, files.v[i], sep="/"), what="character", sep="\n")
    Encoding(text.v)<-"UTF-8"
    text.v <- paste(text.v, collapse=" ")
    text.lower.v <- tolower(text.v)
    text.words.v <- strsplit(text.lower.v, "\\W")
    text.words.v <- unlist(text.words.v)
    text.words.v <- text.words.v[which(text.words.v!="")]
    text.word.vector.l[[files.v[i]]] <- text.words.v
  }
  return(text.word.vector.l)
}

#tiempo

tiempo <- function(argument1){
  tiempo.l<-list()
  for (i in 1:length(argument1)){
    time<-seq(1:length(argument1[[i]]))
    tiempo.l[[files.v[i]]]<-time
  }
  return(tiempo.l)
}

#palabra

palabra<-function(keyword, argument1, argument2){
  hits.l<-list()
  for (i in 1:length(argument1)) {
    hits.v<-which(argument1[[i]]==keyword)
    hits.keyword.v<-rep(NA, length(argument2[[i]]))
    hits.keyword.v[hits.v]<-1
    hits.l[[files.v[i]]]<-hits.keyword.v
  }
  return(hits.l)
}

#dispersion

dispersion<-function(argument1){
  options(scipen=5)
  for (i in 1:length(argument1)) {
    plot(argument1[[i]], main="Dispersion plot",
         xlab="time", ylab="keyword", type="h", ylim=c(0,1), yaxt='n')
  }
}

How can I plot this together? Here's a picture of what I feel it should look like:

What I am trying to do is more or less having all these plots together:

Answer 1

Your example isn't reproducible, so the code below uses novels by Jane Austen to plot word locations using ggplot2. Hopefully you can adapt this code to your needs

library(tidyverse)
library(janeaustenr)
library(scales)

# Function to plot dispersion of a given vector of words in novels by Jane Austen
plot.dispersion = function(words) {

  pattern = paste(words, collapse="|")

  # Get locations of each input word in each text
  # Adapted from Text Mining with R (https://www.tidytextmining.com/tfidf.html)
  texts = austen_books() %>% 
    group_by(book) %>% 
    mutate(text = str_split(tolower(text), "\\W")) %>% 
    unnest %>% 
    filter(text != "") %>% 
    mutate(word.num = 1:n(),
           pct = word.num/n()) %>% 
    filter(grepl(pattern, text)) %>% 
    mutate(text = str_extract(text, pattern))

  # Plot the word locations
  ggplot(texts, aes(y=book, x=pct)) +
    geom_point(shape="|", size=5) +
    facet_grid(text ~ .) +
    scale_x_continuous(labels=percent) +
    labs(x="Percent of book", y="") +
    theme_bw() +
    theme(panel.grid.major.x=element_blank(),
          panel.grid.minor.x=element_blank())
}

plot.dispersion(c("independent", "property"))

Scatter plot or dispersion plot in R

Question

1 answers

solution1
3 ACCPTED 2018-09-04 00:02:28

Scatter plot or dispersion plot in R

Question

1 answers

solution1 3 ACCPTED 2018-09-04 00:02:28

solution1
3 ACCPTED 2018-09-04 00:02:28