R Text Mining - Converting Term Document Matrix

Question

I created a list of bigrams using:

BigramTokenizer <- function(x) NGramTokenizer(x, Weka_control(min = 2, max = 2))
tdm_a.bigram = TermDocumentMatrix(docs_a,
                                control = list(tokenize = BigramTokenizer))

I am trying to get a count of documents each bigram is appearing in. If I understand correctly Term Document Matrix will give how many times each bigram occurs within a document. But I just need '1'-present in a document and '0'-not there.

How do I convert Term Document Matrix into dataframe or matrix to be able to get such count?

Answer 1

A TDM is a simple_triplet_matrix from the slam package. Which has some fucntions for common operations line row/colSums.

slam::row_sums(tdm_a.bigram>=1)

This should tell you how many documents contained each bigram.

R Text Mining - Converting Term Document Matrix

Question

1 answers

solution1
0 ACCPTED 2017-07-07 15:31:53

R Text Mining - Converting Term Document Matrix

Question

1 answers

solution1 0 ACCPTED 2017-07-07 15:31:53

solution1
0 ACCPTED 2017-07-07 15:31:53