簡體   English   中英

奇怪的 output 使用 PAM LDA 在 python 中進行主題建模

[英]Strange output for topic modeling in python using PAM LDA

我正在嘗試對僅包含英文單詞的 dataframe 進行主題建模,您可以將其替換為任何文本-

dfi['clean_text']
Out[154]: 
0        thank you for calling my name is gabrielle and...
1        your available my first name is was there you ...
2                                                    good 
3                                           go head sorry 
4        no go head i mean how do you want to pull my r...
                       
14676                              just the email is fine 
14677    okay great so then everything is process here ...
14678                         no thats it i appreciate it 
14679    yes and thank you very much we appreciated hav...
14680                                   thank you bye bye 

我的 model -

#Pachinko Allocation Model
import tomotopy as tp
from pprint import pprint

model = tp.LDAModel(k=2, seed=1)  #k is the number of topics

for texts in dfi['clean_text']:
    model.add_doc(texts)

model.train(iter=100)

#Extracting the word distribution of a topic
for k in range(model.k):
    print(f"Topic {k}")
    pprint(model.get_topic_words(k, top_n=5))
Topic 0
[(' ', 0.2129271924495697),
 ('e', 0.08137548714876175),
 ('o', 0.0749373733997345),
 ('a', 0.07390690594911575),
 ('t', 0.06929121911525726)]
Topic 1
[(' ', 0.19975200295448303),
 ('e', 0.09751541167497635),
 ('t', 0.06939278542995453),
 ('i', 0.06373799592256546),
 ('o', 0.06239694356918335)]

但正如您在此處看到的,output 沒有按主題顯示字符串或單詞,它只是出於某種奇怪的原因顯示字母。 我是 python 的新手,可能在這里遺漏了一些東西。

我正在嘗試對僅包含英文單詞的 dataframe 進行主題建模,您可以將其替換為任何文本-

dfi['clean_text']
Out[154]: 
0        thank you for calling my name is gabrielle and...
1        your available my first name is was there you ...
2                                                    good 
3                                           go head sorry 
4        no go head i mean how do you want to pull my r...
                       
14676                              just the email is fine 
14677    okay great so then everything is process here ...
14678                         no thats it i appreciate it 
14679    yes and thank you very much we appreciated hav...
14680                                   thank you bye bye 

我的 model -

#Pachinko Allocation Model
import tomotopy as tp
from pprint import pprint

model = tp.LDAModel(k=2, seed=1)  #k is the number of topics

for texts in dfi['clean_text']:
    model.add_doc(texts)

model.train(iter=100)

#Extracting the word distribution of a topic
for k in range(model.k):
    print(f"Topic {k}")
    pprint(model.get_topic_words(k, top_n=5))
Topic 0
[(' ', 0.2129271924495697),
 ('e', 0.08137548714876175),
 ('o', 0.0749373733997345),
 ('a', 0.07390690594911575),
 ('t', 0.06929121911525726)]
Topic 1
[(' ', 0.19975200295448303),
 ('e', 0.09751541167497635),
 ('t', 0.06939278542995453),
 ('i', 0.06373799592256546),
 ('o', 0.06239694356918335)]

但正如您在此處看到的,output 沒有按主題顯示字符串或單詞,它只是出於某種奇怪的原因顯示字母。 我是 python 的新手,可能在這里遺漏了一些東西。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM