在 MLPClassification Python 中实现 K 折交叉验证

Question

I am learning how to develop a Backpropagation Neural Network using scikit-learn.我正在学习如何使用 scikit-learn 开发反向传播神经网络。 I still confuse with how to implement k-fold cross validation in my neural network.我仍然对如何在我的神经网络中实现 k 折交叉验证感到困惑。 I wish you guys can help me out.我希望你们能帮助我。 My code is as follow:我的代码如下：

import numpy as np
from sklearn.model_selection import KFold
from sklearn.neural_network import MLPClassifier

f = open("seeds_dataset.txt")
data = np.loadtxt(f)

X=data[:,0:]
y=data[:,-1]
kf = KFold(n_splits=10)
X_train, X_test, y_train, y_test = X[train], X[test], y[train], y[test]
clf = MLPClassifier(solver='lbfgs', alpha=1e-5, hidden_layer_sizes=(5, 2), random_state=1)
clf.fit(X, y)
MLPClassifier(activation='relu', alpha=1e-05, batch_size='auto',
       beta_1=0.9, beta_2=0.999, early_stopping=False,
       epsilon=1e-08, hidden_layer_sizes=(5, 2), learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=1, shuffle=True,
       solver='lbfgs', tol=0.0001, validation_fraction=0.1, verbose=False,
       warm_start=False)

Answer 1

Do not split your data into train and test.不要将您的数据分成训练和测试。 This is automatically handled by the KFold cross-validation.这由 KFold 交叉验证自动处理。

from sklearn.model_selection import KFold
kf = KFold(n_splits=10)
clf = MLPClassifier(solver='lbfgs', alpha=1e-5, hidden_layer_sizes=(5, 2), random_state=1)

for train_indices, test_indices in kf.split(X):
    clf.fit(X[train_indices], y[train_indices])
    print(clf.score(X[test_indices], y[test_indices]))

KFold validation partitions your dataset into n equal, fair portions. KFold 验证将您的数据集划分为 n 个相等的公平部分。 Each portion is then split into test and train.然后将每个部分分成测试和训练。 With this, you get a fairly accurate measure of the accuracy of your model since it is tested on small portions of fairly distributed data.有了这个，您就可以相当准确地衡量模型的准确性，因为它是在小部分公平分布的数据上进行测试的。

Answer 2

Kudos to @COLDSPEED's answer.感谢@COLDSPEED 的回答。

If you'd like to have the prediction of n fold cross-validation, cross_val_predict() is the way to go.如果您想进行 n 折交叉验证的预测，则 cross_val_predict() 是您要走的路。

# Scamble and subset data frame into train + validation(80%) and test(10%)
df = df.sample(frac=1).reset_index(drop=True)
train_index = 0.8
df_train = df[ : len(df) * train_index]

# convert dataframe to ndarray, since kf.split returns nparray as index
feature = df_train.iloc[:, 0: -1].values
target = df_train.iloc[:, -1].values

solver = MLPClassifier(activation='relu', solver='adam', alpha=1e-5, hidden_layer_sizes=(5, 2), random_state=1, verbose=True)
y_pred = cross_val_predict(solver, feature, target, cv = 10)

Basically, the option cv indicates how many cross-validation you'd like to do in the training.基本上，选项 cv 表示您希望在培训中进行多少次交叉验证。 y_pred is the same size as target. y_pred 与目标大小相同。

Answer 3

In case you are looking for already built in method to do this, you can take a look at cross_validate .如果您正在寻找已经内置的方法来执行此操作，您可以查看cross_validate 。

from sklearn.model_selection import cross_validate 

model = MLPClassifier() 
cv_results = cross_validate(model, X, Y, cv=10, 
                            return_train_score=False, 
                            scoring=model.score) 
print("Fit scores: {}".format(cv_results['test_score']))

The thing I like about this approach is it gives you access to the fit_time, score_time, and test_score.我喜欢这种方法的一点是它可以让您访问 fit_time、score_time 和 test_score。 It also allows you to supply your choice of scoring metrics and cross-validation generator/iterable (ie Kfold).它还允许您提供您选择的评分指标和交叉验证生成器/迭代器（即 Kfold）。 Another good resource is Cross Validation .另一个很好的资源是Cross Validation 。

在 MLPClassification Python 中实现 K 折交叉验证

问题描述

3 个解决方案

解决方案1
14 已采纳 2017-06-21 18:22:54

解决方案2
3 2018-05-15 20:04:12

解决方案3
3 2019-04-23 06:52:22

在 MLPClassification Python 中实现 K 折交叉验证

问题描述

3 个解决方案

解决方案1 14 已采纳 2017-06-21 18:22:54

解决方案2 3 2018-05-15 20:04:12

解决方案3 3 2019-04-23 06:52:22

解决方案1
14 已采纳 2017-06-21 18:22:54

解决方案2
3 2018-05-15 20:04:12

解决方案3
3 2019-04-23 06:52:22