I have a character vector that looks like this:
charVec[1:10]
[1] "dentistry" "free" "cache" "key" "containing" "cite" "templates" "deprecated" "errors" "dates"
I then make all 3 letter combinations of the vector:
combwords <- t(combn(charVec,3))
This gives me the following matrix combwords:
[,1] [,2] [,3]
[1,] "import" "school" "dentistry"
[2,] "import" "school" "school"
[3,] "import" "school" "log"
[4,] "import" "school" "search"
[5,] "import" "school" "current"
[6,] "import" "school" "advanced"
Now I want to create a Document Term Matrix (DTM) of each row of the combwords matrix:
word_corpus <- Corpus(VectorSource(combwords))
This doesn't work...how can I get each row of the matrix (combwords) as a row in the corpus?
library(tm)
foo <- apply(combwords, 1, paste, collapse = " ")
foo
## [1] "dentistry free cache" "dentistry free key"
## [3] "dentistry free containing" "dentistry free cite"
## [5] "dentistry cache key" "dentistry cache containing"
## [7] "dentistry cache cite" "dentistry key containing"
## [9] "dentistry key cite" "dentistry containing cite"
## [11] "free cache key" "free cache containing"
## [13] "free cache cite" "free key containing"
## [15] "free key cite" "free containing cite"
## [17] "cache key containing" "cache key cite"
## [19] "cache containing cite" "key containing cite"
tt <- Corpus(VectorSource(foo))
DocumentTermMatrix(tt)
## A document-term matrix (20 documents, 6 terms)
##
## Non-/sparse entries: 60/60
## Sparsity : 50%
## Maximal term length: 10
## Weighting : term frequency (tf)
as.matrix(DocumentTermMatrix(tt))
## Terms
## Docs cache cite containing dentistry free key
## 1 1 0 0 1 1 0
## 2 0 0 0 1 1 1
## 3 0 0 1 1 1 0
## 4 0 1 0 1 1 0
## 5 1 0 0 1 0 1
## 6 1 0 1 1 0 0
## 7 1 1 0 1 0 0
## 8 0 0 1 1 0 1
## 9 0 1 0 1 0 1
## 10 0 1 1 1 0 0
## 11 1 0 0 0 1 1
## 12 1 0 1 0 1 0
## 13 1 1 0 0 1 0
## 14 0 0 1 0 1 1
## 15 0 1 0 0 1 1
## 16 0 1 1 0 1 0
## 17 1 0 1 0 0 1
## 18 1 1 0 0 0 1
## 19 1 1 1 0 0 0
## 20 0 1 1 0 0 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.