简体   繁体   English

使用R(topicmodels)的LDA的不同结果

[英]Different results of LDA using R(topicmodels)

I am using R topicmodels to train an LDA model from a small corpus, but I find that every time I repeat the same code, it has the different results (different topics and different topic terms) My question is why the same condition and same corpus has the different result every time, and what should I do to stabilize the result? 我正在使用R topicmodels从小型语料库训练LDA模型,但是我发现每次重复相同的代码,其结果都会有所不同(不同的主题和不同的主题词)。我的问题是为什么相同条件和相同语料库每次都有不同的结果,我应该怎么做才能稳定结果? Here is my code: 这是我的代码:

library(tm)
library(topicmodels)
cname<-file.path(".","corpus","train")
docs<-Corpus(DirSource(cname))
toSpace<-content_transformer(function(x,pattern) gsub(pattern,"",x))
docs<-tm_map(docs,toSpace,"/")
docs<-tm_map(docs,toSpace,"@")
docs<-tm_map(docs,toSpace,"#")
docs<-tm_map(docs,toSpace,"\\|")
docs<-tm_map(docs,toSpace,"&")
docs<-tm_map(docs,content_transformer(tolower))
docs<-tm_map(docs,removeNumbers)
docs<-tm_map(docs,removePunctuation)
docs<-tm_map(docs,removeWords,stopwords("english"))
docs<-tm_map(docs,removeWords,c("amp"))
docs<-tm_map(docs,stripWhitespace)
dtm<-DocumentTermMatrix(docs)
dtm_LDA<-LDA(dtm,5)
get_terms(dtm_LDA,10)

I have try set.seed , but it seems doesn't work. 我已经尝试过set.seed ,但是似乎不起作用。 And I find similar questions LDA model generates different topics every time I train on the same corpus , but it is a python one. 我发现类似的问题是, 每当我在相同的语料库上进行训练时,LDA模型都会生成不同的主题 ,但这是一个python。

For those who come across same issue. 对于那些遇到相同问题的人。 You can try set the value of random seed as fixed by specifying the control attribute in LDA function as below. 您可以尝试通过在LDA函数中指定控件属性,将随机种子的值设置为固定值,如下所示。 Find more information here . 在此处查找更多信息。

lda <- LDA(AssociatedPress[1:20, ], control=list(seed=0), k=2)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM