[英]Can GridsearchSV include a randomization for train_test_split
With Sklearn there is GridSearchCV to test multiple variables for classifier functions, for example:使用 Sklearn 有 GridSearchCV 来测试分类器函数的多个变量,例如:
parameters = {
'learning_rate': [0.001,0.005,0.003],
'n_estimators': [300,800,1200],
'criterion': ['friedman_mse','mse','mae'],
'verbose':[1],
'loss' : ['deviance','exponential'],
'random_state':[0]
}
GBC = GradientBoostingClassifier()
grid = GridSearchCV(GBC, parameters)
grid.fit(X,y ) # X = data, y = result
best_est = grid.best_estimator_
print(best_est)
predictions = best_est.predict(T) # T contains data to apply it on.
But what if one would like to do cross validation?但是,如果想要进行交叉验证呢? Eg in a similar manner to
train_test_split
:例如以与
train_test_split
类似的方式:
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=41)
Here we have a random_state
(which might have a big impact).这里我们有一个
random_state
(这可能会产生很大的影响)。 Is it possible to include the GridSearchCV with an array of a few random numbers to make sure it works optimally with 'most' random state's of some data's train/test split?是否可以将 GridSearchCV 包含一些随机数的数组,以确保它与某些数据的训练/测试拆分的“大多数”随机状态一起最佳工作?
For the record, I know this isn't inside GridSearchCV (or as far as i know), I'm asking here what such a method might look like.作为记录,我知道这不在 GridSearchCV 中(或者据我所知),我在这里问这样的方法可能是什么样子。 Perhaps there is some clever way to do this?
也许有一些聪明的方法可以做到这一点?
You can specify ShuffleSplit
as a cross-validation generator.您可以将
ShuffleSplit
指定为交叉验证生成器。
For example:例如:
from sklearn.model_selection import GridSearchCV, ShuffleSplit
GBC = GradientBoostingClassifier()
grid = GridSearchCV(GBC,
param_grid=parameters,
cv=ShuffleSplit(train_size=X.shape[0],
test_size=.3,
n_splits=5,
random_state=41))
grid.fit(X, y)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.