简体   繁体   中英

Conditional tuning of hyperparameters with RandomizedSearchCV in scikit-learn

I want to use RandomizedSearchCV in sklearn to search for the optimal hyperparameter values for a support vector classifier on my dataset. The hyperparameters I am optimising are "kernel", "C" and "gamma". However, in the case of a "poly" kernel, I would also like to optimise a fourth hyperparameter, "degree" (the index of the polynomial kernel function).

I realise that since the degree hyperparameter is ignored when the kernel is not "poly", I could just include degree in the params dictionary I provide to RandomizedSearchCV (as I've done in the code below). However, ideally I would like to search uniformely across the non-poly kernels plus each degree of poly kernel, ie I want to sample uniformly across eg [(kernel="linear"), (kernel="rbf"), (kernel="poly", degree=2), (kernel="poly", degree=3)]. Therefore, I was wondering if it is possible to conditionally introduce a hyperparameter for tuning, ie if kernel="poly" degree=np.linspace(2, 5, 4), else degree=0.

I haven't been able to find an example of this in the RandomizedSearchCV documentation, and so was wondering if anybody here had come across the same issue and would be able to help. Thanks!

from sklearn.svm import SVC
from sklearn.model_selection import RandomizedSearchCV
from sklearn.model_selection import StratifiedKFold

clf = SVC()
params = {'kernel': ['linear', 'poly', 'rbf', 'sigmoid'],
          'degree': np.linspace(2, 5, 4),
          'C': np.logspace(-3, 5, 17),
          'gamma': np.logspace(-3, 5, 17)}

random_search = RandomizedSearchCV(
    estimator=clf, param_distributions=params, n_iter=200, n_jobs=-1,
    cv=StratifiedKFold(n_splits=5), iid=False
)

I am not sure you can make conditional arguments for or within the gridsearch (it would feel like a useful feature). However, one solution to go around this, is to simply set all the hyperparameters for randomizesearchcv add make use of the errors_raise paramater, which will allow you to pass through the iterations that would normally fail and stop your process. Like this:

from sklearn.svm import SVC
from sklearn.model_selection import RandomizedSearchCV
from sklearn.model_selection import StratifiedKFold

clf = SVC()
params = {'kernel': ['linear', 'poly', 'rbf', 'sigmoid'],
          'degree': np.linspace(2, 5, 4),
          'C': np.logspace(-3, 5, 17),
          'gamma': np.logspace(-3, 5, 17)}

random_search = RandomizedSearchCV(
    estimator=clf, param_distributions=params, n_iter=200, n_jobs=-1,
    cv=StratifiedKFold(n_splits=5), iid=False,errors_raise=0)

However

From sklearn's SVC documentation you shouldn't have any problems passing degree :

degree: int, optional (default=3) Degree of the polynomial kernel function ('poly'). Ignored by all other kernels.

Unfortunately, GridsearchCV and RandomizedSearchCV don't support conditional tuning of hyperparameters.

Hyperopt supports conditional tuning of hyperparameters, check this wiki for more details.

Example:

space4svm = {
    'C': hp.uniform('C', 0, 20),
    'kernel': hp.choice('kernel', [
            {'ktype': 'linear'},
            {'ktype': 'poly', 'degree': hp.lognormal('degree', 0, 1)},
            ]),
    'gamma': hp.uniform('gamma', 0, 20),
    'scale': hp.choice('scale', [0, 1]),
    'normalize': hp.choice('normalize', [0, 1])
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM