简体   繁体   中英

Can I use LDA topic modeling if I do not know the number of topics

I have more than 100k .txt files with newspaper issues and I need to define the lexical field of protectionism. Nonetheless, the newspaper issues treat very diverse subjects and I cannot know the total number of topics. Can I still use LDA topic modeling to find the lexical field or is there another method (maybe supervised learning)?

You probably can, but have a look at this CorEx idea. This works very nice and gives you the opportunity to guide the groups by supplying a set of anchor words (so you can call it semi-supervised leraning).

You could give it ["protectionism", "tarifs", "trade wars", ...] as ancors for one topic and even try to push articles that are non related to the topic you are interested in into a second topic by defining ancor words that have nothing to do with your topic eg ["police protection", "custom features", ...]

The notebooks supplied are really excellent and will have you up and running in no time

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM