我如何測量用 R 中的 textmineR 包制作的 LDA 模型的困惑度分數？

Question

我在 R 中制作了一個 LDA 主題模型，使用 textmineR 包，如下所示。

## get textmineR dtm
dtm2 <- CreateDtm(doc_vec = dat2$fulltext, # character vector of documents
                 ngram_window = c(1, 2), 
                 doc_names = dat2$names,
                 stopword_vec = c(stopwords::stopwords("da"), custom_stopwords),
                 lower = T, # lowercase - this is the default value
                 remove_punctuation = T, # punctuation - this is the default
                 remove_numbers = T, # numbers - this is the default
                 verbose = T,
                 cpus = 4)



dtm2 <- dtm2[, colSums(dtm2) > 2]
dtm2 <- dtm2[, str_length(colnames(dtm2)) > 2]


############################################################
## RUN & EXAMINE TOPIC MODEL
############################################################

# Draw quasi-random sample from the pc
set.seed(34838)

model2 <- FitLdaModel(dtm = dtm2, 
                     k = 8,
                     iterations = 500,
                     burnin = 200,
                     alpha = 0.1,
                     beta = 0.05,
                     optimize_alpha = TRUE,
                     calc_likelihood = TRUE,
                     calc_coherence = TRUE,
                     calc_r2 = TRUE,
                     cpus = 4)

那么問題是： 1. 我應該應用哪個函數來獲得 textmineR 包中的困惑度分數？ 我似乎找不到一個。
2. 我如何衡量不同數量主題（k）的復雜度分數？

Answer 1

正如所問：除非您自己明確編程，否則無法使用textmineR計算困惑textmineR 。 TBH，我從未見過您無法通過可能性和連貫性獲得的困惑的價值，所以我沒有實現它。

但是， text2vec包確實有一個實現。 請參閱以下示例：

library(textmineR)

# model ships with textmineR as example
m <- nih_sample_topic_model

# dtm ships with textmineR as example
d <- nih_sample_dtm

# get perplexity
p <- text2vec::perplexity(X = d, 
                          topic_word_distribution = m$phi, 
                          doc_topic_distribution = m$theta)

我如何測量用 R 中的 textmineR 包制作的 LDA 模型的困惑度分數？

問題描述

1 個解決方案

解決方案1
2 已采納 2019-12-29 00:12:43

我如何測量用 R 中的 textmineR 包制作的 LDA 模型的困惑度分數？

問題描述

1 個解決方案

解決方案1 2 已采納 2019-12-29 00:12:43

解決方案1
2 已采納 2019-12-29 00:12:43