简体   繁体   English

Scikit学习 - 如何使用SVM和随机森林进行文本分类?

[英]Scikit learn - How to use SVM and Random Forest for text classification?

I have a set of trainFeatures and a set of testFeatures with positive, neutral and negative labels: 我有一组trainFeatures和一组带有正面,中性和负面标签的testFeatures

trainFeats = negFeats + posFeats + neutralFeats
testFeats  = negFeats + posFeats + neutralFeats

For example, one entry inside the trainFeats is 例如, trainFeats一个条目是

(['blue', 'yellow', 'green'], 'POSITIVE') 

the same for the list of test features, so I specify the labels for each set. 对于测试功能列表也是如此,因此我为每个集指定了标签。 My question is how can I use the scikit implementation of Random Forest classifier and SVM to get the accuracy of this classifier altogether with precision and recall scores for each class? 我的问题是如何使用随机森林分类器和SVM的scikit实现来获得这个分类器的准确性与每个类的精确度和召回分数? The problem is that I am currently using words as features, while from what I read these classifiers require numbers. 问题是我目前正在使用单词作为功能,而从我读到的这些分类器需要数字。 Is there a way I can achieve my purpose without changing functionality? 有没有办法在不改变功能的情况下实现我的目的? Many thanks! 非常感谢!

You can look into this scikit-learn tutorial and especially the section on learning and predicting for how to create and use a classifier. 您可以查看这个scikit-learn教程 ,尤其是关于学习和预测如何创建和使用分类器的部分。 The example uses SVM, however it is simple to use RandomForestClassifier instead as all classifiers implement the fit and predict methods. 该示例使用SVM,但是使用RandomForestClassifier很简单,因为所有分类器都实现了fitpredict方法。

When working with text features you can use CountVectorizer or DictVectorizer . 使用文本功能时,您可以使用CountVectorizerDictVectorizer Take a look at feature extraction and especially section 4.1.3 . 看一下特征提取 ,特别是4.1.3节

You can find an example for classifying text documents here . 您可以在此处找到用于对文本文档进行分类的示例。

Then you can get the precision and recall of the classifier with the classification report . 然后,您可以使用分类报告获得分类器的精确度和召回率。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 ValueError : scikit learn 的随机森林分类 - ValueError : Random forest classification by scikit learn 在scikit学习中确保随机森林分类中操作的正确顺序 - Ensuring right order of operations in random forest classification in scikit learn 使用 Python 和 scikit-learn 进行 SVM 文本分类的最重要功能 - The most import features for a SVM text classification with Python and scikit-learn 使用Scikit Learn SVM准备用于文本分类的数据 - Prepare data for text classification using Scikit Learn SVM 导出Scikit学习随机森林以在Hadoop平台上使用 - Exporting a Scikit Learn Random Forest for use on Hadoop Platform 如何使用虚拟变量来表示python scikit中的分类数据 - 学习随机森林 - How to use dummy variable to represent categorical data in python scikit-learn random forest 如何使用 Scikit Learn 调整随机森林中的参数? - How to tune parameters in Random Forest, using Scikit Learn? 如何为scikit学习随机森林模型设置阈值 - how to set threshold to scikit learn random forest model Scikit-learn随机森林树 - 如何解释'样本'和'价值'? - Scikit-learn random forest tree - how to interpret 'samples' and 'values'? 有没有办法在 Python 中为具有多个分类的随机森林制作部分依赖图(使用 scikit-learn)? - Is there a way to make partial dependence plots for random forest with multiple classification in Python (using scikit-learn)?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM