[英]Error creating TermDocumentMatrix in tm package
I am new to the tm
package, and have run into an obstacle when trying to apply the TermDocumentMatrix
function. 我是tm
包的新手,尝试应用TermDocumentMatrix
函数时遇到了障碍。
I have used the following code up until the function fails: 在功能失败之前,我一直使用以下代码:
myCorpus <- Corpus(VectorSource(posts$message))
myCorpus <- tm_map(myCorpus, content_transformer(tolower))
myCorpus <- tm_map(myCorpus, removePunctuation)
myCorpus <- tm_map(myCorpus, removeNumbers)
removeURL <- function(x) gsub("http[[:alnum:]]*", "", x)
myCorpus <- tm_map(myCorpus, removeURL)
myStopwords <- c(stopwords("english"))
myCorpus <- tm_map(myCorpus, removeWords, myStopwords)
myCorpusCopy <- myCorpus
myCorpus <- tm_map(myCorpus, stemDocument)
Upon inspection it seems as if the list of documents is what it should be: 经检查,似乎文件清单应为:
> for(i in 1:5) {
+ cat(paste("[[", i, "]] ", sep =""))
+ writeLines(myCorpus[[i]])
+ }
[[1]] syntel recruitment drive week freshers newregistrationlink passout graduates
qualification graduatebebtechmcamemtech
syntel registration link
limited referrals available
comment emailids reference future job upd
[[2]] dont miss opportunity get placed one best mnc companies world ebay freshers week january
qualification graduate can apply
ebay registration link
comment emailids fast beacuse referrals left
[[3]] recent passouts eligible apply wipro go updated link lastday reference drive jan apply link fresher referral
apply link
go link apply asap
[[4]] robertbosch recruitment drive week freshers newregistrationlink passout graduates
qualification graduatebebtechmcamemtech
robertbosch registration link
limited referrals available
comment emailids reference future job upd
[[5]] mega job openings year
mphasis recruitment freshers january
qualification btech bsc bca graduates mca mba mtech post graduates
mphasis registration link
comment emailids comment box reference future job updates emailbox
however, after creating a copy of corpus for stem completion, the problem arises. 但是,在创建了用于完成词干的主体的副本之后,出现了问题。
myCorpus <- tm_map(myCorpus, stemCompletion,
dictionary = myCorpusCopy, lazy = TRUE)
> tdm <- TermDocumentMatrix(myCorpus, control = list(wordLengths = c(1, Inf)))
Error in UseMethod("meta", x) :
no applicable method for 'meta' applied to an object of class "try-error"
In addition: Warning messages:
1: In mclapply(x$content[i], function(d) tm_reduce(d, x$lazy$maps)) :
all scheduled cores encountered errors in user code
2: In mclapply(unname(content(x)), termFreq, control) :
all scheduled cores encountered errors in user code
Any ideas for a workaround? 有任何解决方法的想法吗?
I think that you have to recall 我认为你必须记得
myCorpus <- Corpus(VectorSource(myCorpus))
before using the TermDocumentMatrix , your final piece of code will be: 在使用TermDocumentMatrix之前,您的最后一段代码将是:
myCorpus <- tm_map(myCorpus, stemCompletion, dictionary = myCorpusCopy)
myCorpus <- Corpus(VectorSource(myCorpus))
tdm <- TermDocumentMatrix(myCorpus, control = list(wordLengths = c(1, Inf)))
If until the stemming of the document no error occured, the previous instructions will solve your problem. 如果在删除文档之前未发生任何错误,那么前面的说明将解决您的问题。
Otherwise, you might try first: 否则,您可以先尝试:
myCorpus <- tm_map(myCorpus, PlainTextDocument)
Before you use 使用之前
myCorpus <- Corpus(VectorSource(myCorpus))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.