GridsearchSV 可以包含 train_test_split 的随机化吗

Question

With Sklearn there is GridSearchCV to test multiple variables for classifier functions, for example:使用 Sklearn 有 GridSearchCV 来测试分类器函数的多个变量，例如：

parameters = {
  'learning_rate': [0.001,0.005,0.003],
  'n_estimators': [300,800,1200],
  'criterion': ['friedman_mse','mse','mae'],
  'verbose':[1],
  'loss' : ['deviance','exponential'],
  'random_state':[0]
  }

GBC = GradientBoostingClassifier()
grid = GridSearchCV(GBC, parameters)
grid.fit(X,y )   # X = data,  y = result
best_est = grid.best_estimator_
print(best_est)

predictions = best_est.predict(T) # T contains data to apply it on.

But what if one would like to do cross validation?但是，如果想要进行交叉验证呢？ Eg in a similar manner to train_test_split :例如以与train_test_split类似的方式：

  X_train, X_test, y_train, y_test = train_test_split(X, y,  random_state=41)

Here we have a random_state (which might have a big impact).这里我们有一个random_state （这可能会产生很大的影响）。 Is it possible to include the GridSearchCV with an array of a few random numbers to make sure it works optimally with 'most' random state's of some data's train/test split?是否可以将 GridSearchCV 包含一些随机数的数组，以确保它与某些数据的训练/测试拆分的“大多数”随机状态一起最佳工作？

For the record, I know this isn't inside GridSearchCV (or as far as i know), I'm asking here what such a method might look like.作为记录，我知道这不在 GridSearchCV 中（或者据我所知），我在这里问这样的方法可能是什么样子。 Perhaps there is some clever way to do this?也许有一些聪明的方法可以做到这一点？

Answer 1

You can specify ShuffleSplit as a cross-validation generator.您可以将ShuffleSplit指定为交叉验证生成器。

For example:例如：

from sklearn.model_selection import GridSearchCV, ShuffleSplit

GBC = GradientBoostingClassifier()
grid = GridSearchCV(GBC,
                    param_grid=parameters,
                    cv=ShuffleSplit(train_size=X.shape[0],
                                    test_size=.3,
                                    n_splits=5,
                                    random_state=41))
grid.fit(X, y)

More on ShuffleSplit here . 更多关于 ShuffleSplit 在这里。

GridsearchSV 可以包含 train_test_split 的随机化吗

问题描述

1 个解决方案

解决方案1
1 已采纳 2018-10-25 20:10:52

GridsearchSV 可以包含 train_test_split 的随机化吗

问题描述

1 个解决方案

解决方案1 1 已采纳 2018-10-25 20:10:52

解决方案1
1 已采纳 2018-10-25 20:10:52