简体   繁体   中英

SVM implementation for document classification in c++

I would like to implement a small project to classify a set of documents (file.txt) into number of categories then test new documents according to that using SVM in c++.

I searched widely for that but still, I did not get full understanding of what i need to do ! I heard about LIBLINEAR library but I do not know how to use it, if I will use TF-IDF, do I need to have a vector for each class ? or one vector for all classes? how to test new document using TF-IDF ? I am really confused !

Is it a requirement that it is written in c++? Python offers a lot of helpful resource and ready-to-use modules for machine learning tasks such as svm implementation and usage.

On scikit-learn for instance, helpful resources about that topic can be found, for instance this one: https://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html

And as far as your question goes - for TF-IDF implementation you will need a vector for every document. For every document, the words in it will be listed and assigned values (TF-IDF values).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM