简体   繁体   English

wordcloud() 中的 freq = 参数有问题

[英]having issues with freq = argument in wordcloud()

I'm working with a dataset which contains constitutional preambles of all of the countries in the world (minus one or two).我正在使用一个数据集,其中包含世界上所有国家的宪法序言(减去一两个)。 Generating a wordcloud of individual countries isn't challenging, but I'm struggling to generate one that visualizes the 20 most common words across all of them.生成单个国家的词云并不具有挑战性,但我正在努力生成一个可视化所有国家的 20 个最常用词的词云。

I suspect the issue lies in the freq argument of wordcloud().我怀疑问题出在 wordcloud() 的频率参数上。

## create a data term matrix
dtm.all.df <- DocumentTermMatrix(all.pream)
    
## coerce dtm.ALL.df into a regular matrix

dtm.all.df.rm <- as.matrix(dtm.all.df)

## create the word cloud
wordcloud(colnames(dtm.all.df.rm), freq = dtm.usa.df.rm[1:718580], min.freq = 2, max.words = 20)

Output: Error in if (min.freq > max(freq)) min.freq <- 0 : missing value where TRUE/FALSE needed site:stackoverflow.com输出: if (min.freq > max(freq)) min.freq <- 0 中的错误:需要 TRUE/FALSE 的缺失值 site:stackoverflow.com

dput(dtm.all.df.rm)输入(dtm.all.df.rm)

structure(c(1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 2, 1, 1, 2, 1), dim = c(1L, 25L), dimnames = list(Docs = "1", 
    Terms = c("america", "blessings", "common", "constitution", 
    "defense", "domestic", "establish", "form", "general", "insure", 
    "justice", "liberty", "ordain", "order", "people", "perfect", 
    "posterity", "promote", "provide", "secure", "states", "tranquility", 
    "union", "united", "welfare"))) 

EDIT: Okay, now I'm really confused, but I've rewritten the code in a separate chunk.编辑:好的,现在我真的很困惑,但我已经在一个单独的块中重写了代码。 It's the same thing packaged in another object as far as I can tell, but I'm getting different outputs.据我所知,这是包装在另一个对象中的同一件事,但我得到了不同的输出。 It's blank when I try to incorporate the entire corpus [1:155,].当我尝试合并整个语料库时它是空白的 [1:155,]。 However, accessing a single document works fine.但是,访问单个文档可以正常工作。 Including multiple documents also generates a word cloud, but I'm suspicious of the output and if I try to include > ~10 documents the word cloud becomes smaller (in word count) until ~90 when I only get one term.包含多个文档也会生成一个词云,但我对输出持怀疑态度,如果我尝试包含 > ~10 个文档,则词云会变得更小(以字数计),直到我只得到一个词时 ~90。

dtm.all.TEST<- DocumentTermMatrix(all.pream)

dtm.all.TEST <- as.matrix(dtm.all.TEST)

## create the word cloud, for afghanistan constitution
wordcloud(words = colnames(dtm.all.TEST), freq = dtm.all.TEST[1,], min.freq = 3, max.words = 20)

output输出

## create the word cloud, for all constitutions
wordcloud(words = colnames(dtm.all.TEST), freq = dtm.all.TEST[1:155,], min.freq = 3, max.words = 20)

EDIT2: Perhaps I didn't get enough sleep last night, I've finally noticed my input in the first code block's wordcloud() has the wrong object. EDIT2:也许我昨晚没有得到足够的睡眠,我终于注意到我在第一个代码块的 wordcloud() 中的输入有错误的对象。 However, I'm still experiencing the other issues reported.但是,我仍然遇到报告的其他问题。 I'm sure it's obvious that I'm very new at this...我敢肯定,很明显我对此很陌生...

Your sample data work if we eliminate some of the arguments:如果我们消除一些论点,您的示例数据将起作用:

wordcloud(colmnames(dtm.all.df), dtm.all.df)

This produces the following plot:这将产生以下图:

云图

To get the distribution of frequencies for your data:要获取数据的频率分布:

table(dtm.all.df)
# dtm.all.df
#  1  2 
# 22  3 

There are 22 words with frequency 1 and 3 words with frequency 2. Use this with your actual data to see what minimum range makes sense.有 22 个频率为 1 的单词和 3 个频率为 2 的单词。将其与您的实际数据一起使用,看看什么最小范围是有意义的。 Then use this to see how many words you have:然后用它来看看你有多少单词:

length(colnames(dtm.all.df))
# [1] 25

The sample data has only 25 words.样本数据只有 25 个单词。 Strictly speaking, min.freq= and max.words= are different ways of pruning the data.严格来说, min.freq=max.words=是剪枝数据的不同方式。 You generally would not need to use both.您通常不需要同时使用两者。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM