简体   繁体   中英

Getting The Top Terms for each Topic in LDA in R

I am implementing LDA for some simple data Sets , I am able to do the topic modelling but the issue is when i am trying to organise the top 6 terms according to their Topics , I am getting some numerical values ( maybe their indexes )

# docs is the dataset formatted and cleaned properly    
dtm<- TermDocumentMatrix(docs, control = list(removePunctuation = TRUE, stopwords=TRUE))
ldaOut<-LDA(dtm,k,method="Gibbs",control=list(nstart=nstart,seed=seed,best=best,burnin=burnin,iter=iter,thin=thin))

# 6 top terms in each topic 
ldaOut.terms<-as.matrix(terms(ldaOut,6))    

write.csv(ldaOut.terms,file=paste("LDAGibbs",k,"TopicsToTerms.csv"))    

The TopicsToTerms file is Generated like ,

    Topic 1 Topic 2 Topic 3 
1   1        5       3  
2   2        1       4  
3   3        2       1  
4   4        3       2  
5   5        4       5  

While I want The Terms (top words for each topic) In the tables , like the following -

    Topic 1   Topic 2     Topic 3   
1     Hat       Cat        Food 

You just need one line of code to fix your problem:

> text = read.csv("~/Desktop/your_data.csv") #your initial dataset
> docs = Corpus(VectorSource(text)) #converting to corpus
> docs = tm_map(docs, content_transformer(tolower)) #cleaning
> ... #cleaning
> dtm = DocumentTermMatrix(docs) #creating a document term matrix
> rownames(dtm) = text

After adding that last line, you can proceed with the remaining code, and you'll get the Terms, and not their indexes. Hope that helped.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM