简体   繁体   中英

Add new document to term document matrix in R

I have term document matrix before and want to add new document to that term document matrix , in another way it can say to join two document matrix.

My term document matrix is :

     Docs
Term   1
eat    7
food   2
run    2
sick   3

Then another document is watch football match and eat food

After the process, i want my term document matrix to be :

         Docs
Term     1   2
eat      7   1
food     2   1
run      2   0
sick     3   0
watch    0   1
football 0   1
match    0   1
and      0   1

I've tried this :

library("SnowballC")
library("NLP")
library("tm")
library("lsa")

                   #mytermdm (term document matrix i have before)

text2 <- "watch fottball match and eat food"
myCorpus <- Corpus(VectorSource(text2))

tdm2 <- TermDocumentMatrix(myCorpus, control = list
                         (removeNumbers = TRUE, 
                         removePunctuation = TRUE, 
                         stopwords=stopwords_en, 
                         stemming=TRUE)
)
mytdm3 <- c(mytermdm,tdm2)
inspect(mytdm3)

I get this :

TermDocumentMatrix (terms: 7, document:2)

Error in `[.simple_triplet_matrix`(x,terms,doc)`
    Repeated indices currently no allowed.

I have solved it, before combine two term document matrix, I replace docs names in tdm2. So, the full algoritm :

library("SnowballC")
library("NLP")
library("tm")
library("lsa")

#mytermdm (term document matrix i have before)

text2 <- "watch fottball match and eat food"
myCorpus <- Corpus(VectorSource(text2))

tdm2 <- TermDocumentMatrix(myCorpus, control = list
                     (removeNumbers = TRUE, 
                     removePunctuation = TRUE, 
                     stopwords=stopwords_en, 
                     stemming=TRUE)
)

colnames(tdm2) <- as.numeric(max(colnames(mytermdm)))+1     #my add solution 


mytdm3 <- c(mytermdm,tdm2)
inspect(mytdm3)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM