简体   繁体   中英

R Text Mining - Converting Term Document Matrix

I created a list of bigrams using:

BigramTokenizer <- function(x) NGramTokenizer(x, Weka_control(min = 2, max = 2))
tdm_a.bigram = TermDocumentMatrix(docs_a,
                                control = list(tokenize = BigramTokenizer))

I am trying to get a count of documents each bigram is appearing in. If I understand correctly Term Document Matrix will give how many times each bigram occurs within a document. But I just need '1'-present in a document and '0'-not there.

How do I convert Term Document Matrix into dataframe or matrix to be able to get such count?

A TDM is a simple_triplet_matrix from the slam package. Which has some fucntions for common operations line row/colSums.

slam::row_sums(tdm_a.bigram>=1)

This should tell you how many documents contained each bigram.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM