简体   繁体   中英

Document Classification

Kindly suggest me a classifier that classifies the documents based on the requirements mentioned below.

I have set of documents which are to be classified. For each classification label, I have the set of terms that are specific to that class label.

As you have labels attached to document, this come under supervised learning . You can use any of the below classifiers to achieve document classification. 1. Naive Bayes classifier 2. Nearest Neighbourhood classifier 3. Decision trees 4. Subspace method

Most of the ml libraries will have implementations for the above techniques. You can refer to this link, if you want to choose which ml library based on the programming language you are comfortabl with. http://daoudclarke.github.io/machine%20learning%20in%20practice/2013/10/08/machine-learning-libraries/

Well, if you already have the terms for your classes you can use some different kinds of classifiers, eg a SVM , a Naive Bayes Classifier or even a Neural Network .

There are some libraries out there which include this classifiers, like weka or mahout .

Recetly I wrote an example how to do this with a Naive Bayes Classifier: Naive Bayes Example , but this is rather an explanation of the concept and no real-world-usable tool.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM