简体   繁体   English

如何在R中执行LDA

[英]how to do LDA in R

My task is to apply LDA on the dataset of amazon reviews and get 50 topics 我的任务是将LDA应用于亚马逊评论的数据集并获得50个主题

I have extracted the review text in a vector and now I am trying to apply LDA 我已经将评论文本提取到矢量中,现在我正在尝试应用LDA

I have created the dtm 我已经创建了DTM

matrix <- create_matrix(dat, language="english", removeStopwords=TRUE,  stemWords=FALSE, stripWhitespace=TRUE, toLower=TRUE)

<<DocumentTermMatrix (documents: 100000, terms: 174632)>>
Non-/sparse entries: 4096244/17459103756
Sparsity           : 100%
Maximal term length: 218
Weighting          : term frequency (tf)

but when I try to do this I get the following error: 但是当我尝试这样做时,出现以下错误:

lda <- LDA(matrix, 30) lda <-LDA(矩阵,30)

Error in LDA(matrix, 30) : 
  Each row of the input matrix needs to contain at least one non-zero entry

Searched for some solutions and used slam to 搜索了一些解决方案并使用了猛击

    matrix1 <- rollup(matrix, 2, na.rm=TRUE, FUN = sum)

still getting the same error 仍然出现相同的错误

I am very new to this can someone help me or suggest me some reference to study about this.It will be very helpful 我对此很陌生,有人可以帮助我还是可以建议我参考一下以进行研究。这将非常有帮助

There are no empty rows in my original matrix and it contains only one column that contain reviews 我的原始矩阵中没有空行,它只包含一列包含评论的列

I have been assigned with kind of similar task , I am also learning and doing , I have developed somewhat , so i am sharing my code snippet , I hope that will Help. 我被分配了类似的任务,我也在学习和做事,我有所发展,所以我分享了我的代码片段,希望对您有所帮助。

library("topicmodels")
library("tm")

func<-function(input){

x<-c("I like to eat broccoli and bananas.",
        "I ate a banana and spinach smoothie for breakfast.",

"Chinchillas and kittens are cute.",
"My sister adopted a kitten yesterday.",
"Look at this cute hamster munching on a piece of broccoli.")



#whole file is lowercased
#text<-tolower(x)

#deleting all common words from the text
#text2<-setdiff(text,stopwords("english"))

#splitting the text into vectors where each vector is a word..
#text3<-strsplit(text2," ")

# Generating a structured text i.e. Corpus
docs<-Corpus(VectorSource(x))

creating content transformers ie functions which will be used to modify objects in R.. 创建内容转换器,即将用于修改R.中对象的函数。

toSpace <- content_transformer(function (x , pattern ) gsub(pattern, " ", x))

#Removing all the special charecters..

docs <- tm_map(docs, toSpace, "/")
docs <- tm_map(docs, toSpace, "@")
docs <- tm_map(docs, toSpace, "\\|")
docs <- tm_map(docs, removeNumbers)

# Remove english common stopwords
docs <- tm_map(docs, removeWords, stopwords("english"))

# Remove punctuations
docs <- tm_map(docs, removePunctuation)

# Eliminate extra white spaces
docs <- tm_map(docs, stripWhitespace)

docs<-tm_map(docs,removeWords,c("\t"," ",""))

dtm<- TermDocumentMatrix(docs, control = list(removePunctuation = TRUE, stopwords=TRUE))

    #print(dtm)


freq<-colSums(as.matrix(dtm))   

print(names(freq))


ord<-order(freq,decreasing=TRUE)

write.csv(freq[ord],"word_freq.csv")

Setting parameters for LDA 设置LDA的参数

        burnin<-4000
        iter<-2000
        thin<-500
        seed<-list(2003,5,63,100001,765)
        nstart<-5
        best<-TRUE

        #Number of Topics
        k<-3

# Docs to topics    
    ldaOut<-LDA(dtm,k,method="Gibbs",control=list(nstart=nstart,seed=seed,best=best,burnin=burnin,iter=iter,thin=thin))

    ldaOut.topics<-as.matrix(topics(ldaOut))
    write.csv(ldaOut.topics,file=paste("LDAGibbs",k,"DocsToTopics.csv"))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM