LDAvis provides a excellent way of visualsing and exploring topic models. LDAvis requires 5 parameters:
The short version of my question is: after fitting a LDA model with vowpal wabbit, how do one derive phi and theta?
theta represents the mixture of topics per document, and must thus sum to 1 per document. phi represents the probability of a term given the topic, and must thus sum to 1 per topic.
After running LDA with vowpal wabbit ( vw
) some kind of weights are stored in a model. A human readable version of that model can be aquired by feeding a special file, with one document per term in the vocabulary while inactivating learning (by the -t
parameter), eg
vw -t -i weights -d dictionary.vw --readable_model readable.model.txt
According to the documentation of vowpal wabbit , all columns expect the first one of readable.model.txt
now "represent the per-word topic distributions."
You can also generate predictions with vw
, ie for a collection of documents
vw -t -i weights -d some-documents.txt -p predictions.txt
Both predictions.txt
and readable.model.txt
has a dimension that reflects the number of inputs (rows) and number of topics (columns), and none of them are probability distributions , because they do not sum to 1 (neither per row, nor per column).
I understand that vw
is not for the faint hearted and that some programming/scripting will be required on my part, but I'm sure there must be some way to derive theta and phi from some the output of vw
. I've been stuck on this problem for days now, please give me some hints.
I don't know how to directly use pyLDAvis with Vowpal Wabbit. However, as you are already using a python tool you could use the Gensim wrapper and pyLDAvis together.
The python wrapper for VowpalWabbit was offered in gensim (< 4.0.0). You can simply use Gensim as if you would have trained the model by Gensim itself after using vwmodel2ldamodel
.
This workaround might be the easiest way if you are not familiar with the internals of Vowpal Wabbit (and LDA in general).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.