having issues with freq = argument in wordcloud()

Question

I'm working with a dataset which contains constitutional preambles of all of the countries in the world (minus one or two). Generating a wordcloud of individual countries isn't challenging, but I'm struggling to generate one that visualizes the 20 most common words across all of them.

I suspect the issue lies in the freq argument of wordcloud().

## create a data term matrix
dtm.all.df <- DocumentTermMatrix(all.pream)
    
## coerce dtm.ALL.df into a regular matrix

dtm.all.df.rm <- as.matrix(dtm.all.df)

## create the word cloud
wordcloud(colnames(dtm.all.df.rm), freq = dtm.usa.df.rm[1:718580], min.freq = 2, max.words = 20)

Output: Error in if (min.freq > max(freq)) min.freq <- 0 : missing value where TRUE/FALSE needed site:stackoverflow.com

dput(dtm.all.df.rm)

structure(c(1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 2, 1, 1, 2, 1), dim = c(1L, 25L), dimnames = list(Docs = "1", 
    Terms = c("america", "blessings", "common", "constitution", 
    "defense", "domestic", "establish", "form", "general", "insure", 
    "justice", "liberty", "ordain", "order", "people", "perfect", 
    "posterity", "promote", "provide", "secure", "states", "tranquility", 
    "union", "united", "welfare")))

EDIT: Okay, now I'm really confused, but I've rewritten the code in a separate chunk. It's the same thing packaged in another object as far as I can tell, but I'm getting different outputs. It's blank when I try to incorporate the entire corpus [1:155,]. However, accessing a single document works fine. Including multiple documents also generates a word cloud, but I'm suspicious of the output and if I try to include > ~10 documents the word cloud becomes smaller (in word count) until ~90 when I only get one term.

dtm.all.TEST<- DocumentTermMatrix(all.pream)

dtm.all.TEST <- as.matrix(dtm.all.TEST)

## create the word cloud, for afghanistan constitution
wordcloud(words = colnames(dtm.all.TEST), freq = dtm.all.TEST[1,], min.freq = 3, max.words = 20)

output

## create the word cloud, for all constitutions
wordcloud(words = colnames(dtm.all.TEST), freq = dtm.all.TEST[1:155,], min.freq = 3, max.words = 20)

EDIT2: Perhaps I didn't get enough sleep last night, I've finally noticed my input in the first code block's wordcloud() has the wrong object. However, I'm still experiencing the other issues reported. I'm sure it's obvious that I'm very new at this...

Answer 1

Your sample data work if we eliminate some of the arguments:

wordcloud(colmnames(dtm.all.df), dtm.all.df)

This produces the following plot:

To get the distribution of frequencies for your data:

table(dtm.all.df)
# dtm.all.df
#  1  2 
# 22  3

There are 22 words with frequency 1 and 3 words with frequency 2. Use this with your actual data to see what minimum range makes sense. Then use this to see how many words you have:

length(colnames(dtm.all.df))
# [1] 25

The sample data has only 25 words. Strictly speaking, min.freq= and max.words= are different ways of pruning the data. You generally would not need to use both.

having issues with freq = argument in wordcloud()

Question

1 answers

solution1
0 2022-07-09 17:19:18

having issues with freq = argument in wordcloud()

Question

1 answers

solution1 0 2022-07-09 17:19:18

solution1
0 2022-07-09 17:19:18