简体   繁体   English

我如何测量用 R 中的 textmineR 包制作的 LDA 模型的困惑度分数?

[英]How do i measure perplexity scores on a LDA model made with the textmineR package in R?

I've made a LDA topic model in R, using the textmineR package, it looks as follows.我在 R 中制作了一个 LDA 主题模型,使用 textmineR 包,如下所示。

## get textmineR dtm
dtm2 <- CreateDtm(doc_vec = dat2$fulltext, # character vector of documents
                 ngram_window = c(1, 2), 
                 doc_names = dat2$names,
                 stopword_vec = c(stopwords::stopwords("da"), custom_stopwords),
                 lower = T, # lowercase - this is the default value
                 remove_punctuation = T, # punctuation - this is the default
                 remove_numbers = T, # numbers - this is the default
                 verbose = T,
                 cpus = 4)



dtm2 <- dtm2[, colSums(dtm2) > 2]
dtm2 <- dtm2[, str_length(colnames(dtm2)) > 2]


############################################################
## RUN & EXAMINE TOPIC MODEL
############################################################

# Draw quasi-random sample from the pc
set.seed(34838)

model2 <- FitLdaModel(dtm = dtm2, 
                     k = 8,
                     iterations = 500,
                     burnin = 200,
                     alpha = 0.1,
                     beta = 0.05,
                     optimize_alpha = TRUE,
                     calc_likelihood = TRUE,
                     calc_coherence = TRUE,
                     calc_r2 = TRUE,
                     cpus = 4) 

The questions are then: 1. Which function should i apply to get the perplexity scores in the textmineR package?那么问题是: 1. 我应该应用哪个函数来获得 textmineR 包中的困惑度分数? I can't seem to find one.我似乎找不到一个。
2. how do i measure complexity scores for different numbers of topics(k)? 2. 我如何衡量不同数量主题(k)的复杂度分数?

As asked: there's no way to calculate perplexity with textmineR unless you explicitly program it yourself.正如所问:除非您自己明确编程,否则无法使用textmineR计算困惑textmineR TBH, I've never seen value of perplexity that you couldn't get with likelihood and coherence, so I didn't implement it. TBH,我从未见过您无法通过可能性和连贯性获得的困惑的价值,所以我没有实现它。

However, the text2vec package does have an implementation.但是, text2vec包确实有一个实现。 See below for example:请参阅以下示例:

library(textmineR)

# model ships with textmineR as example
m <- nih_sample_topic_model

# dtm ships with textmineR as example
d <- nih_sample_dtm

# get perplexity
p <- text2vec::perplexity(X = d, 
                          topic_word_distribution = m$phi, 
                          doc_topic_distribution = m$theta)


声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用“textmineR”包将通过 LDA 在 R 中重试的主题分配给特定文档 - how to assign the topics retried via LDA in R using "textmineR" package to the specific documents 使用 textmineR 的 LDA 模型中每个文档的主题标签 - Topic label of each document in LDA model using textmineR 用 MALLET 训练的 LDA 模型的奇怪困惑值 - Strange perplexity values of LDA model trained with MALLET 如何在R中执行LDA - how to do LDA in R 主题建模:如何使用我的拟合LDA模型来预测R中新数据集的新主题? - Topic Modeling: How do I use my fitted LDA model to predict new topics for a new dataset in R? 带有用于R的topicmodels软件包的LDA,如何获得每个术语的主题概率? - LDA with topicmodels package for R, how do I get the topic probability for each term? 如何使用Gibbs采样计算LDA的困惑度 - How to calculate perplexity for LDA with Gibbs sampling 我如何找到来自 MASS 的 LDA function 指定观察属于哪个 class 的分数? - How do I find the scores at which the LDA function from MASS specifies to which class an observation belongs? 执行困惑度函数评估LDA模型时出错 - Getting an error while executing perplexity function to evaluate the LDA model 为什么在Spark mllib中报告LDA模型的日志复杂性如此缓慢? - Why is reporting the log perplexity of an LDA model so slow in Spark mllib?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM