在tm R包中的findAssocs（）中使用单词列表

Question

I have been working with the findAssocs()function from the tm package in R. If I am using the function with a single word I don't have any problems and I can manually input a multiple words I would like to find associations to in the following format: 我一直在使用R中tm包中的findAssocs（）函数。如果我使用一个单词使用该函数，我没有任何问题，可以手动输入多个单词，我希望在其中找到关联以下格式：

findAssoc(corpusname,"cat","dog","elephant",.75,.75,.75)

Again no problem with manually inputting the multiple terms. 同样，手动输入多个术语也没有问题。 I am trying to find the associations to lists of terms sometimes that might contact 30 or 40 words. 我试图找到有时可能接触30或40个单词的术语列表的关联。 I would like to us either a list or vector with findAssocs() instead of having to type out each word every time. 我希望我们使用findAssocs（）来创建列表或向量，而不必每次都键入每个单词。 Any ideas how to do this? 任何想法如何做到这一点？ I tried making a custom function but I still so new to RI did not have any luck. 我尝试制作一个自定义函数，但是对RI来说我还是很陌生，没有任何运气。 Thanks. 谢谢。

Thanks for the help. 谢谢您的帮助。 R has a pretty steep learning curve for a newbie. 对于新手来说，R的学习曲线相当陡峭。 I tried the first method that you suggested and get an the error "Error: is.character(terms) is not TRUE" The code that I am using is: 我尝试了您建议的第一种方法，并收到错误消息“错误：is.character（terms）不正确”，我使用的代码是：

 #data for associates list
wordAssocList<- read.csv("Word Assocs List.txt")
# change TRUE to FALSE if you have no column headings in the CSV
as.character(wordAssocList)
attributes(wordAssocList)
my_assocs <- findAssocs(tdm, wordAssocList, .01)
my_assocs

For the output I get the following: 对于输出，我得到以下内容：

as.character(wordAssocList) [1] "logical(0)" attributes(wordAssocList) $names [1] "ÿþp" as.character（wordAssocList）[1]“逻辑（0）”属性（wordAssocList）$ names [1]“ÿþp”

$class [1] "data.frame" $ class [1]“ data.frame”

$row.names integer(0) $ row.names整数（0）

my_assocs <- findAssocs(tdm, wordAssocList, .01) Error: is.character(terms) is not TRUE my_assocs <-findAssocs（tdm，wordAssocList，.01）错误：is.character（terms）不正确

Answer 1

Vectors shouldn't be a problem. 向量应该不是问题。 See following example. 请参见以下示例。

library(tm)

data("crude")
tdm <- TermDocumentMatrix(crude)

words <- c("oil", "opec", "xyz")
corr <- c(0.7, 0.75, 0.1)

# returns a list
my_assocs <- findAssocs(tdm, words, corr)

# turns list into a list of named dataframes.
my_list <- lapply(my_assocs, function(x) data.frame(terms = names(x), cor = x, stringsAsFactors = FALSE))

edit: With the new version of dplyr (0.43) you can create a useful dataframe for the dataframes in the list, showing you the name of the dataframe the information is coming from. 编辑：使用新版本的dplyr（0.43），您可以为列表中的数据框创建有用的数据框，向您显示信息来源的数据框的名称。 Handy for visualizations and other investigations. 方便进行可视化和其他调查。

my_df <- dplyr::bind_rows(my_list, .id = "source")

Source: local data frame [28 x 3]

   source    terms   cor
    (chr)    (chr) (dbl)
1     oil     15.8  0.87
2     oil  clearly  0.80
3     oil     late  0.80
4     oil   trying  0.80
5     oil      who  0.80
6     oil   winter  0.80
7     oil analysts  0.79
8     oil     said  0.78
9     oil  meeting  0.77
10    oil    above  0.76
..    ...      ...   ...

You could even use a dataframe instead of 2 vectors, just replace words and corr with the corresponding columns in your dataframe. 您甚至可以使用一个数据框而不是2个向量，只需将word和corr替换为数据框中的相应列即可。 The advantage of this, is that you can read in a text-file (or excel) where you have your lists of words and correlations 这样做的好处是，您可以在文本文件（或excel）中读取单词和相关性列表

在tm R包中的findAssocs（）中使用单词列表

问题描述

1 个解决方案

解决方案1
1 2015-09-03 06:49:55

在tm R包中的findAssocs（）中使用单词列表

问题描述

1 个解决方案

解决方案1 1 2015-09-03 06:49:55

解决方案1
1 2015-09-03 06:49:55