简体   繁体   中英

SVM: scaled dataset gives worse results?

I have a multiclass classification problem. My dataset (let's call data X and labels - y ) represents sets of points on 640x480 images, so all elements in X are integers in range of valid pixels. I'm trying to use SVM for this problem. If I run SVM against dataset as is , it gives accuracy of 74% . However, if I scale data to range [0..1] , it gives much poorer results - only 69% of correct results.

I double checked histogram of elements in X and its scaled version Xs , and they are identical. So data is not corrupted, just normalized. Knowing ideas behind SVM I assumed scaling should not affect results, but it does. So why does it happen?


Here's my code in case I made mistake in it:

>>> from sklearn.cross_validation import cross_val_score
>>> from sklearn.svm import SVC
>>> 
>>> X, y = ...
>>> Xs = X.astype(np.float32) / (X.max() - X.min())    
>>> cross_val_score(SVC(kernel='linear'), X, y, cv=10).mean()
0.74531073446327667
>>> cross_val_score(SVC(kernel='linear'), Xs, y, cv=10).mean()
0.69485875706214695

Scaling should certainly affect results, but it should improve them. However, the performance of an SVM is critically dependent on its C setting, which trades off the cost of misclassification on the training set vs. model simplicity, and which should be determined using eg grid search and nested cross-validation . The default settings are very rarely optimal for any given problem.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM