简体   繁体   English

C++中文档分类的SVM实现

[英]SVM implementation for document classification in c++

I would like to implement a small project to classify a set of documents (file.txt) into number of categories then test new documents according to that using SVM in c++.我想实现一个小项目,将一组文档(file.txt)分类为多个类别,然后根据在 C++ 中使用 SVM 的测试新文档。

I searched widely for that but still, I did not get full understanding of what i need to do !我对此进行了广泛的搜索,但仍然没有完全了解我需要做什么! I heard about LIBLINEAR library but I do not know how to use it, if I will use TF-IDF, do I need to have a vector for each class ?我听说过 LIBLINEAR 库,但我不知道如何使用它,如果我将使用 TF-IDF,我是否需要为每个类都有一个向量? or one vector for all classes?还是所有类的一个向量? how to test new document using TF-IDF ?如何使用 TF-IDF 测试新文档? I am really confused !我真的很困惑!

Is it a requirement that it is written in c++?是否要求用 C++ 编写? Python offers a lot of helpful resource and ready-to-use modules for machine learning tasks such as svm implementation and usage. Python 为机器学习任务(例如 svm 实现和使用)提供了许多有用的资源和即用型模块。

On scikit-learn for instance, helpful resources about that topic can be found, for instance this one: https://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html例如,在 scikit-learn 上,可以找到有关该主题的有用资源,例如这个: https : //scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html

And as far as your question goes - for TF-IDF implementation you will need a vector for every document.就您的问题而言 - 对于 TF-IDF 实施,您需要为每个文档提供一个向量。 For every document, the words in it will be listed and assigned values (TF-IDF values).对于每个文档,其中的单词将被列出并分配值(TF-IDF 值)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM