Topic label of each document in LDA model using textmineR

Question

I'm using textmineR to fit a LDA model to documents similar to https://cran.r-project.org/web/packages/textmineR/vignettes/c_topic_modeling.html . Is it possible to get the topic label for each document in the data set?

>library(textmineR)
>data(nih_sample)
> # create a document term matrix 
> dtm <- CreateDtm(doc_vec = nih_sample$ABSTRACT_TEXT,doc_names = 
 nih_sample$APPLICATION_ID, ngram_window = c(1, 2), stopword_vec = 
 c(stopwords::stopwords("en"), stopwords::stopwords(source = "smart")),lower 
 = TRUE, remove_punctuation = TRUE,remove_numbers = TRUE, verbose = FALSE, 
 cpus = 2) 
 >dtm <- dtm[,colSums(dtm) > 2]
 >set.seed(123)
 > model <- FitLdaModel(dtm = dtm, k = 20,iterations = 200,burnin = 
 180,alpha = 0.1, beta = 0.05, optimize_alpha = TRUE, calc_likelihood = 
 TRUE,calc_coherence = TRUE,calc_r2 = TRUE,cpus = 2)

then adding the labels to the model:

 > model$labels <- LabelTopics(assignments = model$theta > 0.05, dtm = dtm, 
   M = 1)

now I want the topic labels for each of 100 document in nih_sample$ABSTRACT_TEXT

Answer 1

Are you looking to label each document by the label of its most prevalent topic? IF so, this is how you could do it:

# convert labels to a data frame so we can merge 
label_df <- data.frame(topic = rownames(model$labels), label = model$labels, stringsAsFactors = FALSE)

# get the top topic for each document
top_topics <- apply(model$theta, 1, function(x) names(x)[which.max(x)][1])

# convert the top topics for each document so we can merge
top_topics <- data.frame(document = names(top_topics), top_topic = top_topics, stringsAsFactors = FALSE)

# merge together. Now each document has a label from its top topic
top_topics <- merge(top_topics, label_df, by.x = "top_topic", by.y = "topic", all.x = TRUE)

This kind of throws away some information that you'd get from LDA though. One advantage of LDA is that each document can have more than one topic. Another is that we can see how much of each topic is in that document. You can do that here by

# set the plot margins to see the labels on the bottom
par(mar = c(8.1,4.1,4.1,2.1))

# barplot the first document's topic distribution with labels
barplot(model$theta[1,], names.arg = model$labels, las = 2)

Topic label of each document in LDA model using textmineR

Question

1 answers

solution1
1 ACCPTED 2019-12-28 23:10:44

Topic label of each document in LDA model using textmineR

Question

1 answers

solution1 1 ACCPTED 2019-12-28 23:10:44

solution1
1 ACCPTED 2019-12-28 23:10:44