简体   繁体   English

计算 Documenttermmatrix 中的标记数

[英]Count the number of tokens in a Documenttermmatrix

I have a question to a Documenttermmatrix.我对 Documenttermmatrix 有疑问。 I would like to use the "LDAVIS" package in R. To visualize my results of the LDA algorithm I need to calculate the number of tokens of every document.我想在 R 中使用“LDAVIS”package。为了可视化 LDA 算法的结果,我需要计算每个文档的标记数。 I don´t have the text corpus for the considered DTM.我没有所考虑的 DTM 的文本语料库。 Does anyone know how I can calculate the amount of tokens for every Document.有谁知道我如何计算每个文档的令牌数量。 The output as a list with the document name and his amount of tokens would be the perfect solution. output 作为包含文档名称和他的令牌数量的列表将是完美的解决方案。

Kind Regards, Tom亲切的问候,汤姆

You can use slam::row_sums .您可以使用slam::row_sums This calculates the row_sums of a document term matrix without first transforming the dtm into a matrix.这会计算文档术语矩阵的 row_sums,而无需先将 dtm 转换为矩阵。 This function comes from the slam package which is installed when you install the tm package.这个function来自安装tm package时安装的slam package。

count_tokens <- slam::row_sums(dtm_goes_here)

# if you want a list
count_tokens_list <- as.list(slam::row_sums(dtm_goes_here))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM