[英]Count the number of tokens in a Documenttermmatrix
I have a question to a Documenttermmatrix.我对 Documenttermmatrix 有疑问。 I would like to use the "LDAVIS" package in R. To visualize my results of the LDA algorithm I need to calculate the number of tokens of every document.
我想在 R 中使用“LDAVIS”package。为了可视化 LDA 算法的结果,我需要计算每个文档的标记数。 I don´t have the text corpus for the considered DTM.
我没有所考虑的 DTM 的文本语料库。 Does anyone know how I can calculate the amount of tokens for every Document.
有谁知道我如何计算每个文档的令牌数量。 The output as a list with the document name and his amount of tokens would be the perfect solution.
output 作为包含文档名称和他的令牌数量的列表将是完美的解决方案。
Kind Regards, Tom亲切的问候,汤姆
You can use slam::row_sums
.您可以使用
slam::row_sums
。 This calculates the row_sums of a document term matrix without first transforming the dtm into a matrix.这会计算文档术语矩阵的 row_sums,而无需先将 dtm 转换为矩阵。 This function comes from the slam package which is installed when you install the tm package.
这个function来自安装tm package时安装的slam package。
count_tokens <- slam::row_sums(dtm_goes_here)
# if you want a list
count_tokens_list <- as.list(slam::row_sums(dtm_goes_here))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.