计算 Documenttermmatrix 中的标记数

Question

I have a question to a Documenttermmatrix.我对 Documenttermmatrix 有疑问。 I would like to use the "LDAVIS" package in R. To visualize my results of the LDA algorithm I need to calculate the number of tokens of every document.我想在 R 中使用“LDAVIS”package。为了可视化 LDA 算法的结果，我需要计算每个文档的标记数。 I don´t have the text corpus for the considered DTM.我没有所考虑的 DTM 的文本语料库。 Does anyone know how I can calculate the amount of tokens for every Document.有谁知道我如何计算每个文档的令牌数量。 The output as a list with the document name and his amount of tokens would be the perfect solution. output 作为包含文档名称和他的令牌数量的列表将是完美的解决方案。

Kind Regards, Tom亲切的问候，汤姆

Answer 1

You can use slam::row_sums .您可以使用slam::row_sums 。 This calculates the row_sums of a document term matrix without first transforming the dtm into a matrix.这会计算文档术语矩阵的 row_sums，而无需先将 dtm 转换为矩阵。 This function comes from the slam package which is installed when you install the tm package.这个function来自安装tm package时安装的slam package。

count_tokens <- slam::row_sums(dtm_goes_here)

# if you want a list
count_tokens_list <- as.list(slam::row_sums(dtm_goes_here))

计算 Documenttermmatrix 中的标记数

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-06-21 13:17:26

计算 Documenttermmatrix 中的标记数

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-06-21 13:17:26

解决方案1
1 已采纳 2021-06-21 13:17:26