I am trying to do topic modeling on my dataframe which just consists of words in english,you can substitute it with any text-
dfi['clean_text']
Out[154]:
0 thank you for calling my name is gabrielle and...
1 your available my first name is was there you ...
2 good
3 go head sorry
4 no go head i mean how do you want to pull my r...
14676 just the email is fine
14677 okay great so then everything is process here ...
14678 no thats it i appreciate it
14679 yes and thank you very much we appreciated hav...
14680 thank you bye bye
My model -
#Pachinko Allocation Model
import tomotopy as tp
from pprint import pprint
model = tp.LDAModel(k=2, seed=1) #k is the number of topics
for texts in dfi['clean_text']:
model.add_doc(texts)
model.train(iter=100)
#Extracting the word distribution of a topic
for k in range(model.k):
print(f"Topic {k}")
pprint(model.get_topic_words(k, top_n=5))
Topic 0
[(' ', 0.2129271924495697),
('e', 0.08137548714876175),
('o', 0.0749373733997345),
('a', 0.07390690594911575),
('t', 0.06929121911525726)]
Topic 1
[(' ', 0.19975200295448303),
('e', 0.09751541167497635),
('t', 0.06939278542995453),
('i', 0.06373799592256546),
('o', 0.06239694356918335)]
But as you can see here, the output is showing no string or words by topic, it just shows alphabets for some strange reason. Im new to python and may be missing something here.
I am trying to do topic modeling on my dataframe which just consists of words in english,you can substitute it with any text-
dfi['clean_text']
Out[154]:
0 thank you for calling my name is gabrielle and...
1 your available my first name is was there you ...
2 good
3 go head sorry
4 no go head i mean how do you want to pull my r...
14676 just the email is fine
14677 okay great so then everything is process here ...
14678 no thats it i appreciate it
14679 yes and thank you very much we appreciated hav...
14680 thank you bye bye
My model -
#Pachinko Allocation Model
import tomotopy as tp
from pprint import pprint
model = tp.LDAModel(k=2, seed=1) #k is the number of topics
for texts in dfi['clean_text']:
model.add_doc(texts)
model.train(iter=100)
#Extracting the word distribution of a topic
for k in range(model.k):
print(f"Topic {k}")
pprint(model.get_topic_words(k, top_n=5))
Topic 0
[(' ', 0.2129271924495697),
('e', 0.08137548714876175),
('o', 0.0749373733997345),
('a', 0.07390690594911575),
('t', 0.06929121911525726)]
Topic 1
[(' ', 0.19975200295448303),
('e', 0.09751541167497635),
('t', 0.06939278542995453),
('i', 0.06373799592256546),
('o', 0.06239694356918335)]
But as you can see here, the output is showing no string or words by topic, it just shows alphabets for some strange reason. Im new to python and may be missing something here.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.