[英]Strange output for topic modeling in python using PAM LDA
我正在嘗試對僅包含英文單詞的 dataframe 進行主題建模,您可以將其替換為任何文本-
dfi['clean_text']
Out[154]:
0 thank you for calling my name is gabrielle and...
1 your available my first name is was there you ...
2 good
3 go head sorry
4 no go head i mean how do you want to pull my r...
14676 just the email is fine
14677 okay great so then everything is process here ...
14678 no thats it i appreciate it
14679 yes and thank you very much we appreciated hav...
14680 thank you bye bye
我的 model -
#Pachinko Allocation Model
import tomotopy as tp
from pprint import pprint
model = tp.LDAModel(k=2, seed=1) #k is the number of topics
for texts in dfi['clean_text']:
model.add_doc(texts)
model.train(iter=100)
#Extracting the word distribution of a topic
for k in range(model.k):
print(f"Topic {k}")
pprint(model.get_topic_words(k, top_n=5))
Topic 0
[(' ', 0.2129271924495697),
('e', 0.08137548714876175),
('o', 0.0749373733997345),
('a', 0.07390690594911575),
('t', 0.06929121911525726)]
Topic 1
[(' ', 0.19975200295448303),
('e', 0.09751541167497635),
('t', 0.06939278542995453),
('i', 0.06373799592256546),
('o', 0.06239694356918335)]
但正如您在此處看到的,output 沒有按主題顯示字符串或單詞,它只是出於某種奇怪的原因顯示字母。 我是 python 的新手,可能在這里遺漏了一些東西。
我正在嘗試對僅包含英文單詞的 dataframe 進行主題建模,您可以將其替換為任何文本-
dfi['clean_text']
Out[154]:
0 thank you for calling my name is gabrielle and...
1 your available my first name is was there you ...
2 good
3 go head sorry
4 no go head i mean how do you want to pull my r...
14676 just the email is fine
14677 okay great so then everything is process here ...
14678 no thats it i appreciate it
14679 yes and thank you very much we appreciated hav...
14680 thank you bye bye
我的 model -
#Pachinko Allocation Model
import tomotopy as tp
from pprint import pprint
model = tp.LDAModel(k=2, seed=1) #k is the number of topics
for texts in dfi['clean_text']:
model.add_doc(texts)
model.train(iter=100)
#Extracting the word distribution of a topic
for k in range(model.k):
print(f"Topic {k}")
pprint(model.get_topic_words(k, top_n=5))
Topic 0
[(' ', 0.2129271924495697),
('e', 0.08137548714876175),
('o', 0.0749373733997345),
('a', 0.07390690594911575),
('t', 0.06929121911525726)]
Topic 1
[(' ', 0.19975200295448303),
('e', 0.09751541167497635),
('t', 0.06939278542995453),
('i', 0.06373799592256546),
('o', 0.06239694356918335)]
但正如您在此處看到的,output 沒有按主題顯示字符串或單詞,它只是出於某種奇怪的原因顯示字母。 我是 python 的新手,可能在這里遺漏了一些東西。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.