I have been working with the findAssocs()function from the tm package in R. If I am using the function with a single word I don't have any problems and I can manually input a multiple words I would like to find associations to in the following format:
findAssoc(corpusname,"cat","dog","elephant",.75,.75,.75)
Again no problem with manually inputting the multiple terms. I am trying to find the associations to lists of terms sometimes that might contact 30 or 40 words. I would like to us either a list or vector with findAssocs() instead of having to type out each word every time. Any ideas how to do this? I tried making a custom function but I still so new to RI did not have any luck. Thanks.
Thanks for the help. R has a pretty steep learning curve for a newbie. I tried the first method that you suggested and get an the error "Error: is.character(terms) is not TRUE" The code that I am using is:
#data for associates list
wordAssocList<- read.csv("Word Assocs List.txt")
# change TRUE to FALSE if you have no column headings in the CSV
as.character(wordAssocList)
attributes(wordAssocList)
my_assocs <- findAssocs(tdm, wordAssocList, .01)
my_assocs
For the output I get the following:
as.character(wordAssocList) [1] "logical(0)" attributes(wordAssocList) $names [1] "ÿþp"
$class [1] "data.frame"
$row.names integer(0)
my_assocs <- findAssocs(tdm, wordAssocList, .01) Error: is.character(terms) is not TRUE
Vectors shouldn't be a problem. See following example.
library(tm)
data("crude")
tdm <- TermDocumentMatrix(crude)
words <- c("oil", "opec", "xyz")
corr <- c(0.7, 0.75, 0.1)
# returns a list
my_assocs <- findAssocs(tdm, words, corr)
# turns list into a list of named dataframes.
my_list <- lapply(my_assocs, function(x) data.frame(terms = names(x), cor = x, stringsAsFactors = FALSE))
edit: With the new version of dplyr (0.43) you can create a useful dataframe for the dataframes in the list, showing you the name of the dataframe the information is coming from. Handy for visualizations and other investigations.
my_df <- dplyr::bind_rows(my_list, .id = "source")
Source: local data frame [28 x 3]
source terms cor
(chr) (chr) (dbl)
1 oil 15.8 0.87
2 oil clearly 0.80
3 oil late 0.80
4 oil trying 0.80
5 oil who 0.80
6 oil winter 0.80
7 oil analysts 0.79
8 oil said 0.78
9 oil meeting 0.77
10 oil above 0.76
.. ... ... ...
You could even use a dataframe instead of 2 vectors, just replace words and corr with the corresponding columns in your dataframe. The advantage of this, is that you can read in a text-file (or excel) where you have your lists of words and correlations
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.