简体   繁体   中英

Python: In which cases will random forest and SVM classifiers can produce high accuracy?

I am using Random Forest and SVM classifiers to do classification, and I have 18322 samples which are unbalanced in 9 classes (3667, 1060, 1267, 2103, 2174, 1495, 884, 1462, 4210). I use 10-fold CV and my training data has 100 feature dimensions. In my samples, training data are not very different in these 100 dimensions, and when I use SVM, the accuracy is approximately 40%, however, when I use RF, the accuracy can be 92%. Then I make my data even less different in these 100 feature dimensions, however, RF can also give me accuracy of 92%, but the accuracy of SVM drops to 25%.

My classifier configurations are:

SVM: LinearSVC(penalty="l1",dual=False)

RF: RandomForestClassifier(n_estimators = 50)

All other parameters are default values. I think there must be something wrong with my RF classifier but I don't know how to check it.

Anyone familiar with these two classifiers can give me some hints?

Linear SVC tries to separate your classes by finding appropriate hyperplanes in euclidean space. Your samples might just not be linearly separable causing poor performance. Random Forest, on the other hand, uses several (in this case 50) simpler classifiers (Decision Trees), each of which has a piece-wise linear decision boundary. When you sum them together you end up with a much more complicated decision function.

In my experience, RF tends to perform quite good with default parameters and even an extensive parameter search improves accuracy only a little. SVM behaves almost exactly opposite.

Have you tried different configurations? How about doing grid search for better parameters for the SVM?

Since you're already using sklearn you can use sklearn.grid_search.GridSearchCV , more details here

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM