奇怪的 output 使用 PAM LDA 在 python 中進行主題建模

Question

我正在嘗試對僅包含英文單詞的 dataframe 進行主題建模，您可以將其替換為任何文本-

dfi['clean_text']
Out[154]: 
0        thank you for calling my name is gabrielle and...
1        your available my first name is was there you ...
2                                                    good 
3                                           go head sorry 
4        no go head i mean how do you want to pull my r...
                       
14676                              just the email is fine 
14677    okay great so then everything is process here ...
14678                         no thats it i appreciate it 
14679    yes and thank you very much we appreciated hav...
14680                                   thank you bye bye

我的 model -

#Pachinko Allocation Model
import tomotopy as tp
from pprint import pprint

model = tp.LDAModel(k=2, seed=1)  #k is the number of topics

for texts in dfi['clean_text']:
    model.add_doc(texts)

model.train(iter=100)

#Extracting the word distribution of a topic
for k in range(model.k):
    print(f"Topic {k}")
    pprint(model.get_topic_words(k, top_n=5))
Topic 0
[(' ', 0.2129271924495697),
 ('e', 0.08137548714876175),
 ('o', 0.0749373733997345),
 ('a', 0.07390690594911575),
 ('t', 0.06929121911525726)]
Topic 1
[(' ', 0.19975200295448303),
 ('e', 0.09751541167497635),
 ('t', 0.06939278542995453),
 ('i', 0.06373799592256546),
 ('o', 0.06239694356918335)]

但正如您在此處看到的，output 沒有按主題顯示字符串或單詞，它只是出於某種奇怪的原因顯示字母。 我是 python 的新手，可能在這里遺漏了一些東西。

Answer 1

我正在嘗試對僅包含英文單詞的 dataframe 進行主題建模，您可以將其替換為任何文本-

dfi['clean_text']
Out[154]: 
0        thank you for calling my name is gabrielle and...
1        your available my first name is was there you ...
2                                                    good 
3                                           go head sorry 
4        no go head i mean how do you want to pull my r...
                       
14676                              just the email is fine 
14677    okay great so then everything is process here ...
14678                         no thats it i appreciate it 
14679    yes and thank you very much we appreciated hav...
14680                                   thank you bye bye

我的 model -

#Pachinko Allocation Model
import tomotopy as tp
from pprint import pprint

model = tp.LDAModel(k=2, seed=1)  #k is the number of topics

for texts in dfi['clean_text']:
    model.add_doc(texts)

model.train(iter=100)

#Extracting the word distribution of a topic
for k in range(model.k):
    print(f"Topic {k}")
    pprint(model.get_topic_words(k, top_n=5))
Topic 0
[(' ', 0.2129271924495697),
 ('e', 0.08137548714876175),
 ('o', 0.0749373733997345),
 ('a', 0.07390690594911575),
 ('t', 0.06929121911525726)]
Topic 1
[(' ', 0.19975200295448303),
 ('e', 0.09751541167497635),
 ('t', 0.06939278542995453),
 ('i', 0.06373799592256546),
 ('o', 0.06239694356918335)]

但正如您在此處看到的，output 沒有按主題顯示字符串或單詞，它只是出於某種奇怪的原因顯示字母。 我是 python 的新手，可能在這里遺漏了一些東西。

奇怪的 output 使用 PAM LDA 在 python 中進行主題建模

問題描述

1 個解決方案

解決方案1
0 2021-04-25 19:30:22

奇怪的 output 使用 PAM LDA 在 python 中進行主題建模

問題描述

1 個解決方案

解決方案1 0 2021-04-25 19:30:22

解決方案1
0 2021-04-25 19:30:22