简体   繁体   中英

Errors when creating a wordcloud using the tm and wordcloud packages in R

I am a little unclear about some errors I'm getting using the tm package.

I know that the wordcloud function in the wordcloud package takes a corpus as an argument:

As stated in the documentation: (the words you give the function) can either be a character vector, or Corpus .

So far so good.

With this in mind, I've got some simple code as follows:

library(tm)
library(wordcloud)

corpus  <-Corpus(DirSource("/.../MUSIC"), readerControl = list(language="lat")) readerControl = list(language="lat"))

a <- tm_map(corpus, removeWords, c(stopwords("en")), mc.cores=1) 

I want this next line to give me a wordcloud:

wordcloud(a)

but instead I get the following error:

Error in simple_triplet_matrix(i = i, j = j, v = as.numeric(v),  
nrow = length(allTerms),  :  'i, j, v' different lengths  

I'm not sure why the corpus is somehow of incorrect dimension. I was under the impression that the corpus was an acceptable input.

Does anyone have any insight into the nature of either of this error, who has seen this before and perhaps has some ideas about various workarounds?

Thanks in advance.

you have to create a DocumentTermMatrix from corpus. try this,

tdm <- TermDocumentMatrix(corpus)  
matrix <- as.matrix(tdm)  # changed to term.matrix
v <- sort(rowSums(matrix),decreasing=TRUE)
d <- data.frame(word = names(v),freq=v)
wordcloud(d$word)

result: 这是结果

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM