TM Package: Error in UseMethod(“TermDocumentMatrix”, x)

Question

I want to plot a term-document matrix like Figure 6 in the JSS article on TM package 1 The article link: https://www.jstatsoft.org/article/view/v025i05

My corpus Speach-English.txt is in here: https://github.com/yushu-liu/speach-english.git

The figure should look like as follow:

Here is my code:

library(tm)
library(stringr)
library(wordcloud)

text <- paste(readLines("D:/Rdata/speach-English.txt"), collapse = " ")
text_tidy <- gsub(pattern = "\\W",replace=" ",text)
text_tidy2 <- gsub(pattern = "\\d",replace=" ",text_tidy)

text_tidy2 <- tolower(text_tidy2)
text_tidy2 <- removeWords(text_tidy2,stopwords())
text_tidy2 <- gsub(pattern = "\\b[A-z]\\b{1}",replace=" ", text_tidy2 )
text_tidy2 <- stripWhitespace(text_tidy2)

textbag <- str_split(text_tidy2,pattern = "\\s+")
textbag <- unlist(textbag)

tdm <- TermDocumentMatrix(textbag, control = list(removePunctuation = TRUE,
                                                removeNumbers = TRUE,
                                                stopwords = TRUE))

plot(tdm, terms = findFreqTerms(tdm, lowfreq = 6)[1:25], corThreshold = 0.5)

But one bug came out:

Error in UseMethod("TermDocumentMatrix", x) : 
  no applicable method for 'TermDocumentMatrix' applied to an object of class "character"

Why? Thanks!

Answer 1

The problem is that you have not created an object of the Corpus class, which is the type of object you need to feed to TermDocumentMatrix() . See an example of how you could do that below.

Another point I would like to note is that in your line str_split(text_tidy2,pattern = "\\\\s+") you split your text into unigrams (individual terms). Hence, you only get documents of one term each. Creating a tdm from this structure does not make much sense. What is the intended purpose of this line? Maybe I can point you to what you want.

library(tm)
text <-  readLines("https://raw.githubusercontent.com/yushu-liu/speach-english/master/speach-English.txt")
#first define the type of source you want to use and how it shall be read
x <- VectorSource(text)
#create a corpus object
x <- VCorpus(x)
#feed it to tdm
tdm <- TermDocumentMatrix(x)
tdm
#<<TermDocumentMatrix (terms: 4159, documents: 573)>>
#Non-/sparse entries: 14481/2368626
#Sparsity           : 99%
#Maximal term length: 21
#Weighting          : term frequency (tf)

TM Package: Error in UseMethod(“TermDocumentMatrix”, x)

Question

1 answers

solution1
2 2017-12-08 08:50:20

TM Package: Error in UseMethod(“TermDocumentMatrix”, x)

Question

1 answers

solution1 2 2017-12-08 08:50:20

solution1
2 2017-12-08 08:50:20