简体   繁体   English

如何使用 cross_val_score() 保留 kfold 的评估分数

[英]How to Retain The Evaluation Score of kfold using cross_val_score()

I want to understand kfold more clearly and how to choose the best model after it is implemented as a cross-validation method.我想更清楚地了解kfold以及如何在将其作为交叉验证方法实施后选择最佳模型。

According to this source: https://machinelearningmastery.com/k-fold-cross-validation/ the steps to carry out kfold are:根据这个来源: https://machinelearningmastery.com/k-fold-cross-validation/ ://machinelearningmastery.com/k-fold-cross-validation/ 进行kfold的步骤是:

  1. Shuffle the dataset randomly随机打乱数据集
  2. Split the dataset into k groups将数据集拆分为 k 个组
  3. For each unique group:对于每个独特的组:

    • Take the group as a hold out or test data set将组作为保留或测试数据集

    • Take the remaining groups as a training data set将剩余的组作为训练数据集

    • Fit a model on the training set and evaluate it on the test set在训练集上拟合模型并在测试集上对其进行评估

    • Retain the evaluation score and discard the model保留评估分数并丢弃模型

4.Summarize the skill of the model using the sample of model evaluation scores 4.使用模型评价分数的样本总结模型的技巧

However, I have a question in relation to this process.但是,我对这个过程有疑问。

what is Retain the evaluation score and discard the model supposed to mean?什么是保留评估分数丢弃模型应该意味着什么? how do you do it?你怎么做呢?

After my research, I believe it may have to do with the sklearn method cross_val_score() , but when I try to implement it, by passing my model to it, it throws the next error:经过我的研究,我相信它可能与 sklearn 方法cross_val_score() ,但是当我尝试实现它时,通过将我的model传递给它,它会引发下一个错误:

Traceback (most recent call last):

File "D:\\ProgramData\\Miniconda3\\envs\\Env_DLexp1\\lib\\site-packages\\joblib\\parallel.py", line 797, in dispatch_one_batch tasks = self._ready_batches.get(block=False) _queue.Empty文件“D:\\ProgramData\\Miniconda3\\envs\\Env_DLexp1\\lib\\site-packages\\joblib\\parallel.py”,第 797 行,在 dispatch_one_batch 任务 = self._ready_batches.get(block=False) _queue.Empty

During handling of the above exception, another exception occurred:在处理上述异常的过程中,又发生了一个异常:

Traceback (most recent call last):
  File "D:\temporary.py", line 187, in <module>
    scores = cross_val_score(model, X_test, y_test, cv=kf,scoring="accuracy")
  File "D:\ProgramData\Miniconda3\envs\Env_DLexp1\lib\site-packages\sklearn\model_selection\_validation.py", line 390, in cross_val_score
    error_score=error_score)
  File "D:\ProgramData\Miniconda3\envs\Env_DLexp1\lib\site-packages\sklearn\model_selection\_validation.py", line 236, in cross_validate
    for train, test in cv.split(X, y, groups))
  File "D:\ProgramData\Miniconda3\envs\Env_DLexp1\lib\site-packages\joblib\parallel.py", line 1004, in __call__
    if self.dispatch_one_batch(iterator):
  File "D:\ProgramData\Miniconda3\envs\Env_DLexp1\lib\site-packages\joblib\parallel.py", line 808, in dispatch_one_batch
    islice = list(itertools.islice(iterator, big_batch_size))
  File "D:\ProgramData\Miniconda3\envs\Env_DLexp1\lib\site-packages\sklearn\model_selection\_validation.py", line 236, in <genexpr>
    for train, test in cv.split(X, y, groups))
  File "D:\ProgramData\Miniconda3\envs\Env_DLexp1\lib\site-packages\sklearn\base.py", line 67, in clone
    % (repr(estimator), type(estimator)))
TypeError: Cannot clone object '<keras.engine.sequential.Sequential object at 0x00000267F9C851C8>' (type <class 'keras.engine.sequential.Sequential'>): it does not seem to be a scikit-learn estimator as it does not implement a 'get_params' methods.

According to the documentation, https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_score.html , the first argument for cross_val_score() must be an estimator, which they define as "estimator object implementing 'fit'.The object to use to fit the data."根据文档, https: cross_val_score()cross_val_score()的第一个参数必须是一个估算器,他们将其定义为“估算器对象实现” fit'。用于拟合数据的对象。”

Therefore, I can't understand the exception.因此,我无法理解异常。

This is the relevant part of my code:这是我的代码的相关部分:

model = Sequential()
model.add(Embedding(max_words, embedding_dim, input_length=maxlen))
model.add(Conv1D(filters=32, kernel_size=8, activation='relu'))
model.add(BatchNormalization(weights=None, epsilon=1e-06, momentum=0.9))
model.add(MaxPooling1D(pool_size=2))
model.add(Flatten())
model.add(Dense(10, activation='relu'))
model.add(Dense(4, activation='softmax'))
print(model.summary())

from sklearn.metrics import precision_recall_fscore_support
from sklearn.model_selection import GridSearchCV,cross_val_score

kf = KFold(n_splits=4, random_state=None, shuffle=True)
print(kf)

for train_index, test_index in kf.split(data):
     print("TRAIN:", train_index, "TEST:", test_index)
     X_train, X_test = data[train_index], data[test_index]
     y_train, y_test = labels[train_index], labels[test_index] 


Adam=keras.optimizers.Adam(learning_rate=0.001, beta_1=0.9, beta_2=0.999, amsgrad=False)
model.compile(optimizer=Adam,
                       loss='sparse_categorical_crossentropy',
                       metrics=['sparse_categorical_accuracy'])
history = model.fit(X_train, y_train,
                            epochs=15, 
                            batch_size=32,
                            verbose=1,  
                            callbacks=callbacks_list, 
                            validation_data=(X_test, y_test)
                       )


scores = cross_val_score(model, X_test, y_test, cv=kf,scoring="accuracy")
print(scores)
print("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2))

I would appreciate any help you can give me.如果您能给我任何帮助,我将不胜感激。 Please take into consideration I am not a data scientist or a developer.请考虑到我不是数据科学家或开发人员。

what is Retain the evaluation score and discard the model supposed to mean?什么是保留评估分数并丢弃模型应该意味着什么?

Retain the evaluation means to save the evaluation of the actual model tested in the CV iteration, to just save it in memory to compare with the next evaluations.保留评估意味着保存在 CV 迭代中测试的实际模型的评估,只是将其保存在内存中以与下一次评估进行比较。

how do you do it?你怎么做呢?

You could use the cross_val_score() of sklearn when using sklearn algorithms, but you are working with keras, so you will need to work with the class KFold, have a look to this kaggle kernel , it shows the implementation you need.在使用 sklearn 算法时,您可以使用 sklearn 的 cross_val_score(),但是您正在使用 keras,因此您需要使用 KFold 类,看看这个kaggle kernel ,它显示了您需要的实现。 There are a lot of examples like this in the internet, just pick the one that you understand the most.网上有很多这样的例子,挑一个你最了解的就行了。

Therefore, I can't understand the exception.因此,我无法理解异常。

The cross_val_score() accepts an estimator as a first parameter. cross_val_score() 接受一个估计器作为第一个参数。 What is an estimator?什么是估算器? According to the documentation, an estimator is a class that implements has a defined structure, following this documentation.根据文档,估计器是一个实现具有定义结构的类,遵循此文档。

As you can see, your keras model does not implement a part of the structure, so you get the error: it does not seem to be a scikit-learn estimator as it does not implement a 'get_params' methods.如您所见,您的 keras 模型没有实现结构的一部分,因此您会收到错误消息:它似乎不是 scikit-learn 估算器,因为它没有实现“get_params”方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM