简体   繁体   English

您如何规范化R中适当的文档术语矩阵的行?

[英]How do you normalize the rows of a document term matrix in place in R?

I have a DocumentTermMatrix named train_dtm and I want to normalize the frequency counts of the term frequencies in all the documents. 我有一个名为train_dtm的DocumentTermMatrix,我想规范所有文档中术语频率的频率计数。 The problem I am facing is that the resulting matrix should also be of type DocumentTermMatrix because I want to pass the normalized matrix to another method LDA of the TopicModels package in R. 我面临的问题是,生成的矩阵也应该是DocumentTermMatrix类型,因为我想将规范化的矩阵传递给R中TopicModels包的另一种方法LDA。

Below is the method I am using: 以下是我使用的方法:

docs_dtm <- DocumentTermMatrix(docs)

Now, I want the rows of the above documenttermmatrix to be normalized. 现在,我希望将以上documenttermmatrix的行标准化。 I even tried adding the control parameter via 我什至尝试通过添加控制参数

docs_dtm <- DocumentTermMatrix(docs, control=list(weighting = function(x) weightTf(x, normalize=TRUE)))

but the above call throws an error saying 但是上面的调用抛出了一个错误

Error in weightTf(x, normalize=TRUE): unused argument (normalize = TRUE)

I have written the method to normalize the values of train_dtm using apply() method but it does not return a matrix of type DocumentTermMatrix. 我已经编写了使用apply()方法来标准化train_dtm值的方法,但是它没有返回DocumentTermMatrix类型的矩阵。

Is there another way to accomplish the above task? 还有另一种方法可以完成上述任务吗?

您能否尝试直接传递weighting参数,例如:

docs_dtm <- DocumentTermMatrix(docs, control = list(weighting = weightTf, normalize = TRUE))

创建dtm后进行标准化:

docs_dtm_norm <- t(apply(docs_dtm, 1, function(x) x/sqrt(sum(x^2))))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM