Okay, so I want to have a single plot where I would have a "x" number of novels and we would be able to see the dispersion of a particular word throughout all novels. Every novel has a different length (number of total words), so the "x" axis would have to be the novels and the "y" axis would have to be the length of each novel. Right now, I am able to create a separate plot for every novel, but I want to have all of them together. Here's what I have so far:
input.dir<-("corpus2")
files.v<-dir(input.dir, "\\.txt$")
corpus<-corpus(files.v, input.dir)
tiempo<-tiempo(corpus)
noche<-palabra("día", corpus, tiempo)
dispersion(noche)
#corpus
corpus<-function(files.v, input.dir){
text.word.vector.l<-list()
for(i in 1:length(files.v)){
text.v <- scan(paste(input.dir, files.v[i], sep="/"), what="character", sep="\n")
Encoding(text.v)<-"UTF-8"
text.v <- paste(text.v, collapse=" ")
text.lower.v <- tolower(text.v)
text.words.v <- strsplit(text.lower.v, "\\W")
text.words.v <- unlist(text.words.v)
text.words.v <- text.words.v[which(text.words.v!="")]
text.word.vector.l[[files.v[i]]] <- text.words.v
}
return(text.word.vector.l)
}
#tiempo
tiempo <- function(argument1){
tiempo.l<-list()
for (i in 1:length(argument1)){
time<-seq(1:length(argument1[[i]]))
tiempo.l[[files.v[i]]]<-time
}
return(tiempo.l)
}
#palabra
palabra<-function(keyword, argument1, argument2){
hits.l<-list()
for (i in 1:length(argument1)) {
hits.v<-which(argument1[[i]]==keyword)
hits.keyword.v<-rep(NA, length(argument2[[i]]))
hits.keyword.v[hits.v]<-1
hits.l[[files.v[i]]]<-hits.keyword.v
}
return(hits.l)
}
#dispersion
dispersion<-function(argument1){
options(scipen=5)
for (i in 1:length(argument1)) {
plot(argument1[[i]], main="Dispersion plot",
xlab="time", ylab="keyword", type="h", ylim=c(0,1), yaxt='n')
}
}
How can I plot this together? Here's a picture of what I feel it should look like:
What I am trying to do is more or less having all these plots together:
Your example isn't reproducible, so the code below uses novels by Jane Austen to plot word locations using ggplot2. Hopefully you can adapt this code to your needs
library(tidyverse)
library(janeaustenr)
library(scales)
# Function to plot dispersion of a given vector of words in novels by Jane Austen
plot.dispersion = function(words) {
pattern = paste(words, collapse="|")
# Get locations of each input word in each text
# Adapted from Text Mining with R (https://www.tidytextmining.com/tfidf.html)
texts = austen_books() %>%
group_by(book) %>%
mutate(text = str_split(tolower(text), "\\W")) %>%
unnest %>%
filter(text != "") %>%
mutate(word.num = 1:n(),
pct = word.num/n()) %>%
filter(grepl(pattern, text)) %>%
mutate(text = str_extract(text, pattern))
# Plot the word locations
ggplot(texts, aes(y=book, x=pct)) +
geom_point(shape="|", size=5) +
facet_grid(text ~ .) +
scale_x_continuous(labels=percent) +
labs(x="Percent of book", y="") +
theme_bw() +
theme(panel.grid.major.x=element_blank(),
panel.grid.minor.x=element_blank())
}
plot.dispersion(c("independent", "property"))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.