简体   繁体   中英

converting a matrix into a document term matrix in R

I have a character vector that looks like this:

charVec[1:10]
[1] "dentistry"  "free"       "cache"      "key"        "containing" "cite"       "templates"  "deprecated" "errors"     "dates"  

I then make all 3 letter combinations of the vector:

combwords <- t(combn(charVec,3))

This gives me the following matrix combwords:

    [,1]     [,2]     [,3]       
[1,] "import" "school" "dentistry"
[2,] "import" "school" "school"   
[3,] "import" "school" "log"      
[4,] "import" "school" "search"   
[5,] "import" "school" "current"  
[6,] "import" "school" "advanced" 

Now I want to create a Document Term Matrix (DTM) of each row of the combwords matrix:

word_corpus <- Corpus(VectorSource(combwords))

This doesn't work...how can I get each row of the matrix (combwords) as a row in the corpus?

library(tm)

foo <- apply(combwords, 1, paste, collapse = " ")
foo

##  [1] "dentistry free cache"       "dentistry free key"        
##  [3] "dentistry free containing"  "dentistry free cite"       
##  [5] "dentistry cache key"        "dentistry cache containing"
##  [7] "dentistry cache cite"       "dentistry key containing"  
##  [9] "dentistry key cite"         "dentistry containing cite" 
## [11] "free cache key"             "free cache containing"     
## [13] "free cache cite"            "free key containing"       
## [15] "free key cite"              "free containing cite"      
## [17] "cache key containing"       "cache key cite"            
## [19] "cache containing cite"      "key containing cite" 

tt <- Corpus(VectorSource(foo))
DocumentTermMatrix(tt)

## A document-term matrix (20 documents, 6 terms)
## 
## Non-/sparse entries: 60/60
## Sparsity           : 50%
## Maximal term length: 10 
## Weighting          : term frequency (tf)

as.matrix(DocumentTermMatrix(tt))

##     Terms
## Docs cache cite containing dentistry free key
##   1      1    0          0         1    1   0
##   2      0    0          0         1    1   1
##   3      0    0          1         1    1   0
##   4      0    1          0         1    1   0
##   5      1    0          0         1    0   1
##   6      1    0          1         1    0   0
##   7      1    1          0         1    0   0
##   8      0    0          1         1    0   1
##   9      0    1          0         1    0   1
##   10     0    1          1         1    0   0
##   11     1    0          0         0    1   1
##   12     1    0          1         0    1   0
##   13     1    1          0         0    1   0
##   14     0    0          1         0    1   1
##   15     0    1          0         0    1   1
##   16     0    1          1         0    1   0
##   17     1    0          1         0    0   1
##   18     1    1          0         0    0   1
##   19     1    1          1         0    0   0
##   20     0    1          1         0    0   1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM