I am a little unclear about some errors I'm getting using the tm
package.
I know that the wordcloud
function in the wordcloud
package takes a corpus as an argument:
As stated in the documentation: (the words you give the function) can either be a character vector, or Corpus .
So far so good.
With this in mind, I've got some simple code as follows:
library(tm)
library(wordcloud)
corpus <-Corpus(DirSource("/.../MUSIC"), readerControl = list(language="lat")) readerControl = list(language="lat"))
a <- tm_map(corpus, removeWords, c(stopwords("en")), mc.cores=1)
I want this next line to give me a wordcloud:
wordcloud(a)
but instead I get the following error:
Error in simple_triplet_matrix(i = i, j = j, v = as.numeric(v),
nrow = length(allTerms), : 'i, j, v' different lengths
I'm not sure why the corpus is somehow of incorrect dimension. I was under the impression that the corpus was an acceptable input.
Does anyone have any insight into the nature of either of this error, who has seen this before and perhaps has some ideas about various workarounds?
Thanks in advance.
you have to create a DocumentTermMatrix
from corpus. try this,
tdm <- TermDocumentMatrix(corpus)
matrix <- as.matrix(tdm) # changed to term.matrix
v <- sort(rowSums(matrix),decreasing=TRUE)
d <- data.frame(word = names(v),freq=v)
wordcloud(d$word)
result:
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.