简体   繁体   中英

Scatter plot or dispersion plot in R

Okay, so I want to have a single plot where I would have a "x" number of novels and we would be able to see the dispersion of a particular word throughout all novels. Every novel has a different length (number of total words), so the "x" axis would have to be the novels and the "y" axis would have to be the length of each novel. Right now, I am able to create a separate plot for every novel, but I want to have all of them together. Here's what I have so far:

input.dir<-("corpus2")
files.v<-dir(input.dir, "\\.txt$")

corpus<-corpus(files.v, input.dir)

tiempo<-tiempo(corpus)

noche<-palabra("día", corpus, tiempo)

dispersion(noche)

#corpus

corpus<-function(files.v, input.dir){
  text.word.vector.l<-list()
  for(i in 1:length(files.v)){
    text.v <- scan(paste(input.dir, files.v[i], sep="/"), what="character", sep="\n")
    Encoding(text.v)<-"UTF-8"
    text.v <- paste(text.v, collapse=" ")
    text.lower.v <- tolower(text.v)
    text.words.v <- strsplit(text.lower.v, "\\W")
    text.words.v <- unlist(text.words.v)
    text.words.v <- text.words.v[which(text.words.v!="")]
    text.word.vector.l[[files.v[i]]] <- text.words.v
  }
  return(text.word.vector.l)
}

#tiempo

tiempo <- function(argument1){
  tiempo.l<-list()
  for (i in 1:length(argument1)){
    time<-seq(1:length(argument1[[i]]))
    tiempo.l[[files.v[i]]]<-time
  }
  return(tiempo.l)
}

#palabra

palabra<-function(keyword, argument1, argument2){
  hits.l<-list()
  for (i in 1:length(argument1)) {
    hits.v<-which(argument1[[i]]==keyword)
    hits.keyword.v<-rep(NA, length(argument2[[i]]))
    hits.keyword.v[hits.v]<-1
    hits.l[[files.v[i]]]<-hits.keyword.v
  }
  return(hits.l)
}

#dispersion

dispersion<-function(argument1){
  options(scipen=5)
  for (i in 1:length(argument1)) {
    plot(argument1[[i]], main="Dispersion plot",
         xlab="time", ylab="keyword", type="h", ylim=c(0,1), yaxt='n')
  }
}

How can I plot this together? Here's a picture of what I feel it should look like:

例

What I am trying to do is more or less having all these plots together: 在此处输入图片说明

Your example isn't reproducible, so the code below uses novels by Jane Austen to plot word locations using ggplot2. Hopefully you can adapt this code to your needs

library(tidyverse)
library(janeaustenr)
library(scales)

# Function to plot dispersion of a given vector of words in novels by Jane Austen
plot.dispersion = function(words) {

  pattern = paste(words, collapse="|")

  # Get locations of each input word in each text
  # Adapted from Text Mining with R (https://www.tidytextmining.com/tfidf.html)
  texts = austen_books() %>% 
    group_by(book) %>% 
    mutate(text = str_split(tolower(text), "\\W")) %>% 
    unnest %>% 
    filter(text != "") %>% 
    mutate(word.num = 1:n(),
           pct = word.num/n()) %>% 
    filter(grepl(pattern, text)) %>% 
    mutate(text = str_extract(text, pattern))

  # Plot the word locations
  ggplot(texts, aes(y=book, x=pct)) +
    geom_point(shape="|", size=5) +
    facet_grid(text ~ .) +
    scale_x_continuous(labels=percent) +
    labs(x="Percent of book", y="") +
    theme_bw() +
    theme(panel.grid.major.x=element_blank(),
          panel.grid.minor.x=element_blank())
}

plot.dispersion(c("independent", "property"))

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM