[英]DocumentTermMatrix with dictionary
I want to convert a corpus to a DocumentTermMatrix with only selected words being tabulated. 我想仅将选定的单词制成表格,将语料库转换为DocumentTermMatrix。 I know the "dictionary" parameter in the control list does this: 我知道控制列表中的“字典”参数可以做到这一点:
a = list("I am a big big big apple", "Petter Petter Peter Peter")
v = VCorpus(VectorSource(a))
my_terms = c("peter", "petter")
DocumentTermMatrix(v, control = list(dictionary = my_terms)) %>% as.matrix()
It gives me this: 它给了我这个:
Terms
Docs peter petter
1 0 0
2 1 1
Whereas what I want looks like this: 而我想要的是这样的:
Terms
Docs peter petter
1 0 0
2 2 2
I was wondering if there is a function/parameter does this. 我想知道是否有一个函数/参数。
It works fine: 它工作正常:
library(magrittr)
library(tm)
a <- list("I am a big big big apple", "Petter Petter Peter Peter")
v <- VCorpus(VectorSource(a))
my_terms <- c("peter", "petter")
DocumentTermMatrix(v, control = list(dictionary = my_terms)) %>%
as.matrix()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.