Getting The Top Terms for each Topic in LDA in R

Question

I am implementing LDA for some simple data Sets , I am able to do the topic modelling but the issue is when i am trying to organise the top 6 terms according to their Topics , I am getting some numerical values ( maybe their indexes )

# docs is the dataset formatted and cleaned properly    
dtm<- TermDocumentMatrix(docs, control = list(removePunctuation = TRUE, stopwords=TRUE))
ldaOut<-LDA(dtm,k,method="Gibbs",control=list(nstart=nstart,seed=seed,best=best,burnin=burnin,iter=iter,thin=thin))

# 6 top terms in each topic 
ldaOut.terms<-as.matrix(terms(ldaOut,6))    

write.csv(ldaOut.terms,file=paste("LDAGibbs",k,"TopicsToTerms.csv"))

The TopicsToTerms file is Generated like ,

    Topic 1 Topic 2 Topic 3 
1   1        5       3  
2   2        1       4  
3   3        2       1  
4   4        3       2  
5   5        4       5

While I want The Terms (top words for each topic) In the tables , like the following -

    Topic 1   Topic 2     Topic 3   
1     Hat       Cat        Food

Answer 1

You just need one line of code to fix your problem:

> text = read.csv("~/Desktop/your_data.csv") #your initial dataset
> docs = Corpus(VectorSource(text)) #converting to corpus
> docs = tm_map(docs, content_transformer(tolower)) #cleaning
> ... #cleaning
> dtm = DocumentTermMatrix(docs) #creating a document term matrix
> rownames(dtm) = text

After adding that last line, you can proceed with the remaining code, and you'll get the Terms, and not their indexes. Hope that helped.

Getting The Top Terms for each Topic in LDA in R

Question

1 answers

solution1
1 2016-07-05 05:49:23

Getting The Top Terms for each Topic in LDA in R

Question

1 answers

solution1 1 2016-07-05 05:49:23

solution1
1 2016-07-05 05:49:23