简体繁体中英

SVM: Choosing Support Vector Machine regression termination criterion tolerence in sklearn

原文 2013-07-30 15:01:16 6 3 python/ machine-learning/ regression/ scikit-learn

I am using sklearn.svr with the RBF kernel on an 80k-size dataset with 20+ variables. I was wondering how to choose the termination parameter tol . I ask because the regression does not seem to converge for certain combinations of C and gamma (2+ days before I give up). Interestingly, it converges after less than 10 minutes for certain combinations with an average run-time of approximately an hour.

Is there some sort of rule of thumb for setting this parameter? Perhaps a relationship to the standard deviation or expected value of the forecast?

3 answers

Mike's answer is correct: subsampling for grid searching parameter is probably the best strategy to train SVR on medium-ish dataset sizes. SVR is not scalable so don't waste your time doing a grid search on the full dataset. Try on 1000 random sub samples, then 2000 and then 4000. Each time find the optimal values for C and gamma and try to guess how they evolve whenever you double the size of the dataset.

Also you can approximate the true SVR solution with the Nystroem kernel approximation and a linear regressor model such as SGDRegressor, LinearRegression, LassoCV or ElasticNetCV. RidgeCV is likely not to improve upon LinearRegression in the n_samples >> n_features regime.

Finally, do not forget to scale your input data by putting a MinMaxScaler or a StandardScaler before the SVR model in a Pipeline .

I would also try GradientBoostingRegressor models (although completely unrelated to SVR).

You may have seen the scikit learn documentation for the RBF function. Considering what C and gamma actually do and the fact that the SVR training time is at worst quadratic in the number of samples, I would try training first on a small subset of the data. By first getting a result for all parameter settings and then scaling up the amount of training data used, you might find you actually only need a small sample of the data to get results very close to the full set.

This is the advice I was given by my MSc project supervisor recently, as I had the exact same problem. I found that out of a set of 120k examples with 250 features I only needed around 3000 samples to get within 2% of the error of the full set models.

Sorry this isn't answering your question directly, but I thought it might help.

You really shouldn't use SVR on large data sets: its training algorithm takes between quadratic and cubic time. sklearn.linear_model.SGDRegressor can fit a linear regression on such datasets without trouble, so try that instead. If linear regression won't hack it, transform your data with a kernel approximation before feeding it to SGDRegressor to get a linear-time approximation of an RBF-SVM.

sklearn support vector machine is not learning

Combining probabilistic estimates with support vector machine in sklearn

Models works accept support vector machine SVM and Linear Discriminant Analysis

Support Vector machine SVM python ValueError: X.shape[1]

SVM (Support Vector Machine) in python always gives the same prediction

Train a SVM (Support Vector Machine) classifier with Scikit-learn

Support vector regression from sklearn gives flat prediction

Suport Vector Machine training :Is sklearn SGDClassifier.partial_fit able to train an SVM incrementally?

Does the SVM in sklearn support incremental (online) learning?

SVM : support vector has margin of 0?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question sklearn support vector machine is not learning Combining probabilistic estimates with support vector machine in sklearn Models works accept support vector machine SVM and Linear Discriminant Analysis Support Vector machine SVM python ValueError: X.shape[1] SVM (Support Vector Machine) in python always gives the same prediction Train a SVM (Support Vector Machine) classifier with Scikit-learn Support vector regression from sklearn gives flat prediction Suport Vector Machine training :Is sklearn SGDClassifier.partial_fit able to train an SVM incrementally? Does the SVM in sklearn support incremental (online) learning? SVM : support vector has margin of 0?

Related Tags

SVM: Choosing Support Vector Machine regression termination criterion tolerence in sklearn

Question

3 answers

solution1
5 2013-07-30 15:45:51

solution2
4 2013-07-30 15:24:56

solution3
4 2013-07-30 15:42:44

SVM: Choosing Support Vector Machine regression termination criterion tolerence in sklearn

Question

3 answers

solution1 5 2013-07-30 15:45:51

solution2 4 2013-07-30 15:24:56

solution3 4 2013-07-30 15:42:44

solution1
5 2013-07-30 15:45:51

solution2
4 2013-07-30 15:24:56

solution3
4 2013-07-30 15:42:44