简体   繁体   English

来自包 tm 的 findAssocs 出现问题

[英]Trouble with findAssocs from package tm

I am attempting to find words associated with a particular word in a term document matrix using the tm package.我正在尝试使用 tm 包在术语文档矩阵中查找与特定单词相关的单词。

I am using findAssocs to do this.我正在使用findAssocs来做到这一点。 Arguments for findAssocs are: findAssocs参数是:

  • x: A term-document matrix. x:术语-文档矩阵。
  • term: A character holding a term. term:持有一个词条的字符。
  • corlimit: A numeric for the lower correlation bound limit. corlimit:相关性下限的数字。

I am consistently getting numeric(0) as my result我一直得到numeric(0)作为我的结果

Example:例子:

findAssocs(test.dtm, "investment", 0.90)
>numeric(0)

Does anyone have familiarity with findAssocs and know what I am doing wrong?有没有人熟悉findAssocs并知道我做错了什么? Or does anyone know more broadly what the numeric(0) result could mean?或者有没有人更广泛地知道numeric(0)结果可能意味着什么?

Thank you very much in advance for any help.非常感谢您提供任何帮助。

This result indicates that there are no words associated in 0.90 of documents with the term "investment".该结果表明在 0.90 份文档中没有与“投资”一词相关联的词。 Try a lower threshold like 0.05 and work your way up to a threshold that yields fewer terms.尝试使用较低的阈值(如 0.05),然后逐步提高到产生较少项的阈值。

I'm getting the same numeric(0) , I think it's because there is only one document in my Corpus , so the document term matrix only have one column.我得到了相同的numeric(0) ,我认为这是因为我的Corpus只有一个文档,所以document term matrix只有一列。 You may want to test TermDocumentMatrix() and see if you have a multi-column matrix .您可能想要测试TermDocumentMatrix()并查看您是否multi-column matrix That said, how do I find association within one document?.也就是说,我如何在一个文档中找到关联?。

It does appear this functionality only works when analyzing multiple text documents.看来此功能仅在分析多个文本文档时才有效。 The only viable solution I have come up with is creating a duplicate of text document and then running the analysis.我想出的唯一可行的解​​决方案是创建文本文档的副本,然后运行分析。 However, it is uncertain if this changes the results in any way.但是,不确定这是否会以任何方式改变结果。 Any additional feedback would be appreciated.任何额外的反馈将不胜感激。

I think it also has to do with your data file.我认为这也与您的数据文件有关。 A text file should work but if it is a .csv with only one column, you will get the (0)文本文件应该可以工作,但如果它是只有一列的 .csv,您将获得 (0)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM