简体   繁体   中英

How to convert LDA output to word topic matrix in R?

library(tm)
library(topicmodels)
lda_topicmodel <- model_LDA(dtm, k=20, control=list(seed=1234))

I performed Latent Dirichlet Allocation using the LDA function in R. Now, I have an LDA in the S4 object format.

How do I convert it to a word-topic matrix and a document-topic matrix in R?

Unfortunately, object of type 'S4' is not subsettable. So, I had to resort to copying a subset of the data for use.

Topic 1     Topic 2   Topic 3   Topic 4    Topic 5     Topic 6    Topic 7         Topic 8    Topic 9      Topic 10    
[1,] "flooding"  "beach"   "sets"    "flooding" "storm"     "fwy"      "storms"        "flooding" "socal"      "rain"      
[2,] "erosion"   "long"    "alltime" "just"     "flooding"  "due"      "thunderstorms" "via"      "major"      "california"
[3,] "cause"     "abc7"    "rain"    "almost"   "years"     "closures" "flash"         "public"   "throughout" "nearly"    
[4,] "emergency" "day"     "slides"  "hardcore" "mudslides" "avoid"    "continue"      "asks"     "abc7"       "southern"  
[5,] "highway"   "history" "last"    "spun"     "snow"      "latest"   "possible"      "call"     "streets"    "storms"  



Topic 11 Topic 12   Topic 13  Topic 14      Topic 15      Topic 16 Topic 17   Topic 18   Topic 19     Topic 20     
[1,] "abc7"   "abc7"     "like"    "widespread"  "widespread"  "across" "rainfall" "flooding" "flooding"   "vehicles"   
[2,] "beach"  "flooding" "closed"  "batters"     "biggest"     "can"    "record"   "region"   "storm"      "several"    
[3,] "long"   "stranded" "live"    "california"  "evacuations" "stay"   "breaks"   "reported" "california" "getting"    
[4,] "fwy"    "county"   "raining" "evacuations" "mudslides"   "home"   "long"     "corona"   "causes"     "floodwaters"
[5,] "710"    "san"      "blog"    "mudslides"   "years"       "wires"  "beach"    "across"   "related"    "stranded" 

The picture contains a subset of the words in each topic: LDA word-topic I wish to write the contents of the S4 object to a csv file like a word-topic matrix as shown: Word-Topic Matrix

I'm using some data from R since we were not able to reproduce your data.

# load the libraries
library(topicmodels)
library(tm)

# load the data we'll be using
data("AssociatedPress")

# estimate a LDA model using the VEM algorithm (default)
# I'll be using the number of k (number of topics) being 2
# just as a example
ap_lda <- LDA(AssociatedPress, 
              k = 2, 
              control = list(seed = 1234))

# get all the terms in a dataframe 
as.data.frame(terms(ap_lda, dim(ap_lda)[1]))

The output would be:

  Topic 1    Topic 2
1 percent          i
2 million  president
3     new government
4    year     people
5 billion     soviet
6    last        new

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM