获取用于RandomizedSearchCV的最佳模型的概率

Question

我正在使用RandomizedSearchCV通过10倍交叉验证和100次迭代来获取最佳参数。 这很好。 但是现在，我也想从性能最好的模型中获得每个预测的测试数据点（例如predict_proba ）的概率。

如何才能做到这一点？

我看到两个选择。 首先，也许有可能直接从RandomizedSearchCV获得这些概率，或者其次，从RandomizedSearchCV获得最佳参数，然后以这种最佳方式再次进行10倍交叉验证（使用相同的种子，以便获得相同的分割）参数。

编辑：以下代码是否正确，以获得最佳性能模型的概率？ X是训练数据，y是标签，模型是我的RandomizedSearchCV其中包含带有估算缺失值，标准化和SVM的Pipeline 。

cv_outer = StratifiedKFold(n_splits=10, shuffle=True, random_state=0)
y_prob = np.empty([y.size, nrClasses]) * np.nan
best_model = model.fit(X, y).best_estimator_

for train, test in cv_outer.split(X, y):
    probas_ = best_model.fit(X[train], y[train]).predict_proba(X[test])
    y_prob[test] = probas_

Answer 1

如果我理解正确，那么您希望针对CV得分最高的案例获得测试样本中每个样本的单独得分。 在这种情况下，您必须使用可让您控制拆分索引的CV生成器之一，例如： http ： //scikit-learn.org/stable/tutorial/statistical_inference/model_selection.html#cross-验证生成器

如果要使用性能最佳的模型来计算新测试样本的分数，则predict_proba()基础模型支持该函数， RandomizedSearchCV的predict_proba()函数就足够了。

例：

import numpy
skf = StratifiedKFold(n_splits=10, random_state=0, shuffle=True)
scores = cross_val_score(svc, X, y, cv=skf, n_jobs=-1)
max_score_split = numpy.argmax(scores)

现在您知道最好的模型发生在max_score_split ，您可以进行拆分，并使其适合您的模型。

train_indices, test_indices = k_fold.split(X)[max_score_split]
X_train = X[train_indices]
y_train = y[train_indices]
X_test = X[test_indices]
y_test = y[test_indices]
model.fit(X_train, y_train) # this is your model object that should have been created before

最后通过以下方式获得您的预测：

model.predict_proba(X_test)

我尚未亲自测试过代码，但应进行较小的修改。

Answer 2

您需要查看cv_results_这将为您提供所有倍数的分数和均值，以及均值，拟合时间等。

如果要对每个迭代使用predict_proba() ，则执行此方法的方法是循环遍历cv_results_给出的参数，为每个模型重新拟合，然后预测概率，因为各个模型都不据我所知缓存在任何地方。

best_params_将为您提供最佳拟合参数，因为如果您想下次仅使用最佳参数来训练模型。

请参阅信息页中的cv_results_ http://scikit-learn.org/stable/modules/generation/sklearn.model_selection.RandomizedSearchCV.html

获取用于RandomizedSearchCV的最佳模型的概率

问题描述

2 个解决方案

解决方案1
1 已采纳 2018-05-07 13:51:10

解决方案2
1 2018-05-07 14:26:20

获取用于RandomizedSearchCV的最佳模型的概率

问题描述

2 个解决方案

解决方案1 1 已采纳 2018-05-07 13:51:10

解决方案2 1 2018-05-07 14:26:20

解决方案1
1 已采纳 2018-05-07 13:51:10

解决方案2
1 2018-05-07 14:26:20