简体   繁体   English

带字典的DocumentTermMatrix

[英]DocumentTermMatrix with dictionary

I want to convert a corpus to a DocumentTermMatrix with only selected words being tabulated. 我想仅将选定的单词制成表格,将语料库转换为DocumentTermMatrix。 I know the "dictionary" parameter in the control list does this: 我知道控制列表中的“字典”参数可以做到这一点:

     a = list("I am a big big big apple", "Petter Petter Peter Peter")
     v = VCorpus(VectorSource(a))
     my_terms = c("peter", "petter")
     DocumentTermMatrix(v, control = list(dictionary = my_terms)) %>% as.matrix()

It gives me this: 它给了我这个:

        Terms
    Docs peter petter
       1     0      0
       2     1      1

Whereas what I want looks like this: 而我想要的是这样的:

        Terms
    Docs peter petter
       1     0      0
       2     2      2
  1. The first document, though empty, must remain there. 第一个文档尽管为空,但必须保留在那里。 (Because it must be matched with a meta-data) (因为它必须与元数据匹配)
  2. The frequency of the word must be shown in the output. 该单词的频率必须在输出中显示。

I was wondering if there is a function/parameter does this. 我想知道是否有一个函数/参数。

It works fine: 它工作正常:

library(magrittr)
library(tm)

a <- list("I am a big big big apple", "Petter Petter Peter Peter")
v <- VCorpus(VectorSource(a))
my_terms <- c("peter", "petter")
DocumentTermMatrix(v, control = list(dictionary = my_terms)) %>% 
         as.matrix()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM