简体   繁体   中英

Strange output for topic modeling in python using PAM LDA

I am trying to do topic modeling on my dataframe which just consists of words in english,you can substitute it with any text-

dfi['clean_text']
Out[154]: 
0        thank you for calling my name is gabrielle and...
1        your available my first name is was there you ...
2                                                    good 
3                                           go head sorry 
4        no go head i mean how do you want to pull my r...
                       
14676                              just the email is fine 
14677    okay great so then everything is process here ...
14678                         no thats it i appreciate it 
14679    yes and thank you very much we appreciated hav...
14680                                   thank you bye bye 

My model -

#Pachinko Allocation Model
import tomotopy as tp
from pprint import pprint

model = tp.LDAModel(k=2, seed=1)  #k is the number of topics

for texts in dfi['clean_text']:
    model.add_doc(texts)

model.train(iter=100)

#Extracting the word distribution of a topic
for k in range(model.k):
    print(f"Topic {k}")
    pprint(model.get_topic_words(k, top_n=5))
Topic 0
[(' ', 0.2129271924495697),
 ('e', 0.08137548714876175),
 ('o', 0.0749373733997345),
 ('a', 0.07390690594911575),
 ('t', 0.06929121911525726)]
Topic 1
[(' ', 0.19975200295448303),
 ('e', 0.09751541167497635),
 ('t', 0.06939278542995453),
 ('i', 0.06373799592256546),
 ('o', 0.06239694356918335)]

But as you can see here, the output is showing no string or words by topic, it just shows alphabets for some strange reason. Im new to python and may be missing something here.

I am trying to do topic modeling on my dataframe which just consists of words in english,you can substitute it with any text-

dfi['clean_text']
Out[154]: 
0        thank you for calling my name is gabrielle and...
1        your available my first name is was there you ...
2                                                    good 
3                                           go head sorry 
4        no go head i mean how do you want to pull my r...
                       
14676                              just the email is fine 
14677    okay great so then everything is process here ...
14678                         no thats it i appreciate it 
14679    yes and thank you very much we appreciated hav...
14680                                   thank you bye bye 

My model -

#Pachinko Allocation Model
import tomotopy as tp
from pprint import pprint

model = tp.LDAModel(k=2, seed=1)  #k is the number of topics

for texts in dfi['clean_text']:
    model.add_doc(texts)

model.train(iter=100)

#Extracting the word distribution of a topic
for k in range(model.k):
    print(f"Topic {k}")
    pprint(model.get_topic_words(k, top_n=5))
Topic 0
[(' ', 0.2129271924495697),
 ('e', 0.08137548714876175),
 ('o', 0.0749373733997345),
 ('a', 0.07390690594911575),
 ('t', 0.06929121911525726)]
Topic 1
[(' ', 0.19975200295448303),
 ('e', 0.09751541167497635),
 ('t', 0.06939278542995453),
 ('i', 0.06373799592256546),
 ('o', 0.06239694356918335)]

But as you can see here, the output is showing no string or words by topic, it just shows alphabets for some strange reason. Im new to python and may be missing something here.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM