简体   繁体   中英

Random Forest for multi-label classification

I am making an application for multilabel text classification . I've tried different machine learning algorithm.

No doubt the SVM with linear kernel gets the best results.

I have also tried to sort through the algorithm Radom Forest and the results I have obtained have been very bad, both the recall and precision are very low.

The fact that the linear kernel to respond better result gives me an idea of the different categories are linearly separable.

Is there any reason the Random Forest results are so low?

The ensemble of the random forest performs well across many domains and types of data. They are excellent at reducing error from variance and don't over fit if trees are kept simple enough.

I would expect a forest to perform comparably to a SVM with a linear kernel.

The SVM will tend to overfit more because it does not benefit from being an ensemble.

If you are not using cross validation of some kind. At minimum measuring performance on unseen data using a test/training regimen than i could see you obtaining this type of result.

Go back and make sure performance is measured on unseen data and likelier you'll see the RF performing more comparably.

Good luck.

It is very hard to answer this question without looking at the data in question.

SVM does have a history of working better with text classification - but machine learning by definition is context dependent.

Consider the parameters by which you are running the random forest algorithm. What are your number and depth of trees, are you pruning branches? Are you searching a larger parameter space for SVMs therefore are more likely to find a better optimum.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM