简体   繁体   English

Twitter上的文本挖掘

[英]Text Mining on Twitter

I am trying to follow tutorial 1 to do text mining on twitter My codes are: 我正在尝试按照教程1在Twitter上进行文本挖掘我的代码是:

library(twitteR)
library(NLP)
library(tm)
library(wordcloud)
library(RColorBrewer)

mh370 <- searchTwitter("#PrayForMH370", since = "2014-03-08", until =     "2014-03-20", n = 1000)
mh370_text = sapply(mh370, function(x) x$getText())
mh370_corpus = Corpus(VectorSource(mh370_text))

tdm = TermDocumentMatrix(mh370_corpus,control = list(removePunctuation =     TRUE,stopwords = c("prayformh370", "prayformh",    stopwords("english")),removeNumbers = TRUE, tolower = TRUE))
m = as.matrix(tdm)
# get word counts in decreasing order
word_freqs = sort(rowSums(m), decreasing = TRUE) 
# create a data frame with words and their frequencies
dm = data.frame(word = names(word_freqs), freq = word_freqs)
wordcloud(dm$word,dm$freq,random.order=FALSE,colors=brewer.pal(8,"Dark2"))

When I run the last code, I get this error: 当我运行最后一个代码时,出现以下错误:

Error in strwidth(words[i], cex = size[i], ...) : invalid 'cex' value
In addition: Warning messages:
1: In max(freq) : no non-missing arguments to max; returning -Inf
2: In max(freq) : no non-missing arguments to max; returning -Inf

Please advice. 请指教。

As Vikram said, maybe you should reduce the number of words in your plot by adding max.words to your wordcloud. 正如Vikram所说,也许您应该通过在wordcloud中添加max.words单词数来减少情节中的单词数。

wordcloud(dm$word, dm$freq, scale=c(8,3), min.freq=2, max.words=120,
          random.order=FALSE, colors=brewer.pal(8,"Dark2"))

I also suggest using min.freq to plot words that appear at least twice and scale to control the size of the words. 我还建议使用min.freq绘制至少出现两次的单词,并scale以控制单词的大小。 Adjust those until you get a nice plot. 调整那些,直到得到一个好的情节。

You may want to removeSparseTerms as well. 您可能还希望removeSparseTerms I encountered a similar problem, and I found this solution some time ago. 我遇到了类似的问题,不久前我找到了此解决方案 I had to modify the solution though, but removing sparse terms worked. 我必须修改解决方案,但是删除稀疏术语是可行的。 tm package has the function. tm包具有此功能。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM