[英]Error creating TermDocumentMatrix in tm package
我是tm
包的新手,尝试应用TermDocumentMatrix
函数时遇到了障碍。
在功能失败之前,我一直使用以下代码:
myCorpus <- Corpus(VectorSource(posts$message))
myCorpus <- tm_map(myCorpus, content_transformer(tolower))
myCorpus <- tm_map(myCorpus, removePunctuation)
myCorpus <- tm_map(myCorpus, removeNumbers)
removeURL <- function(x) gsub("http[[:alnum:]]*", "", x)
myCorpus <- tm_map(myCorpus, removeURL)
myStopwords <- c(stopwords("english"))
myCorpus <- tm_map(myCorpus, removeWords, myStopwords)
myCorpusCopy <- myCorpus
myCorpus <- tm_map(myCorpus, stemDocument)
经检查,似乎文件清单应为:
> for(i in 1:5) {
+ cat(paste("[[", i, "]] ", sep =""))
+ writeLines(myCorpus[[i]])
+ }
[[1]] syntel recruitment drive week freshers newregistrationlink passout graduates
qualification graduatebebtechmcamemtech
syntel registration link
limited referrals available
comment emailids reference future job upd
[[2]] dont miss opportunity get placed one best mnc companies world ebay freshers week january
qualification graduate can apply
ebay registration link
comment emailids fast beacuse referrals left
[[3]] recent passouts eligible apply wipro go updated link lastday reference drive jan apply link fresher referral
apply link
go link apply asap
[[4]] robertbosch recruitment drive week freshers newregistrationlink passout graduates
qualification graduatebebtechmcamemtech
robertbosch registration link
limited referrals available
comment emailids reference future job upd
[[5]] mega job openings year
mphasis recruitment freshers january
qualification btech bsc bca graduates mca mba mtech post graduates
mphasis registration link
comment emailids comment box reference future job updates emailbox
但是,在创建了用于完成词干的主体的副本之后,出现了问题。
myCorpus <- tm_map(myCorpus, stemCompletion,
dictionary = myCorpusCopy, lazy = TRUE)
> tdm <- TermDocumentMatrix(myCorpus, control = list(wordLengths = c(1, Inf)))
Error in UseMethod("meta", x) :
no applicable method for 'meta' applied to an object of class "try-error"
In addition: Warning messages:
1: In mclapply(x$content[i], function(d) tm_reduce(d, x$lazy$maps)) :
all scheduled cores encountered errors in user code
2: In mclapply(unname(content(x)), termFreq, control) :
all scheduled cores encountered errors in user code
有任何解决方法的想法吗?
我认为你必须记得
myCorpus <- Corpus(VectorSource(myCorpus))
在使用TermDocumentMatrix之前,您的最后一段代码将是:
myCorpus <- tm_map(myCorpus, stemCompletion, dictionary = myCorpusCopy)
myCorpus <- Corpus(VectorSource(myCorpus))
tdm <- TermDocumentMatrix(myCorpus, control = list(wordLengths = c(1, Inf)))
如果在删除文档之前未发生任何错误,那么前面的说明将解决您的问题。
否则,您可以先尝试:
myCorpus <- tm_map(myCorpus, PlainTextDocument)
使用之前
myCorpus <- Corpus(VectorSource(myCorpus))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.