简体   繁体   中英

TM Package: Error in UseMethod(“TermDocumentMatrix”, x)

I want to plot a term-document matrix like Figure 6 in the JSS article on TM package 1 The article link: https://www.jstatsoft.org/article/view/v025i05

My corpus Speach-English.txt is in here: https://github.com/yushu-liu/speach-english.git

The figure should look like as follow:

在此处输入图片说明

Here is my code:

library(tm)
library(stringr)
library(wordcloud)

text <- paste(readLines("D:/Rdata/speach-English.txt"), collapse = " ")
text_tidy <- gsub(pattern = "\\W",replace=" ",text)
text_tidy2 <- gsub(pattern = "\\d",replace=" ",text_tidy)

text_tidy2 <- tolower(text_tidy2)
text_tidy2 <- removeWords(text_tidy2,stopwords())
text_tidy2 <- gsub(pattern = "\\b[A-z]\\b{1}",replace=" ", text_tidy2 )
text_tidy2 <- stripWhitespace(text_tidy2)

textbag <- str_split(text_tidy2,pattern = "\\s+")
textbag <- unlist(textbag)

tdm <- TermDocumentMatrix(textbag, control = list(removePunctuation = TRUE,
                                                removeNumbers = TRUE,
                                                stopwords = TRUE))

plot(tdm, terms = findFreqTerms(tdm, lowfreq = 6)[1:25], corThreshold = 0.5)

But one bug came out:

Error in UseMethod("TermDocumentMatrix", x) : 
  no applicable method for 'TermDocumentMatrix' applied to an object of class "character"

Why? Thanks!

The problem is that you have not created an object of the Corpus class, which is the type of object you need to feed to TermDocumentMatrix() . See an example of how you could do that below.

Another point I would like to note is that in your line str_split(text_tidy2,pattern = "\\\\s+") you split your text into unigrams (individual terms). Hence, you only get documents of one term each. Creating a tdm from this structure does not make much sense. What is the intended purpose of this line? Maybe I can point you to what you want.

library(tm)
text <-  readLines("https://raw.githubusercontent.com/yushu-liu/speach-english/master/speach-English.txt")
#first define the type of source you want to use and how it shall be read
x <- VectorSource(text)
#create a corpus object
x <- VCorpus(x)
#feed it to tdm
tdm <- TermDocumentMatrix(x)
tdm
#<<TermDocumentMatrix (terms: 4159, documents: 573)>>
#Non-/sparse entries: 14481/2368626
#Sparsity           : 99%
#Maximal term length: 21
#Weighting          : term frequency (tf)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM