For a huge set of articles, I want to get the topic models with weightage assigned to different topics & within topics, what are the weightage for different sub-topics. For example, if I feed an article which falls in both Business & Technology domain, then the program's output shuold be something like this :-
What's the best open-source language processing programs available that can successfully do this stuff?
您可以使用开源NLTK Toolkit进行分类。
I would give NLTK a try, but scikit-learn, even though it has a steeper learning curve than NLTK, is probably a better bet. It's much more configurable.
There are several programs to do a part of this task, for a starter I recommend mallet . Note that any topic modeling program gives you the topics in the form you want, ie,
( 0.438 - Marketing , 0.375 - Companies, 0.062 - Office Work)
but the labels (in this example Business ) you need to assign yourself. Mallet also gives you a decomposition of the text to the topics (identified by numbers, not by the labels).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.