使用前馈神经网络进行超参数调整和过拟合 - Mini-Batch Epoch 和交叉验证

Question

I am looking at implementing a hyper-parameter tuning method for a feed-forward neural network (FNN) implemented using PyTorch .我正在考虑为使用PyTorch实现的前馈神经网络 (FNN) 实现超参数调整方法。 My original FNN , the model is named net , has been implemented using a mini-batch learning approach with epochs:我最初的 FNN 模型名为net ，已使用具有 epochs 的小批量学习方法实现：

#Parameters
batch_size = 50 #larger batch size leads to over fitting
num_epochs = 1000 
learning_rate = 0.01 #was .01-AKA step size - The amount that the weights are updated during training
batch_no = len(x_train) // batch_size 

criterion = nn.CrossEntropyLoss()  #performance of a classification model whose output is a probability value between 0 and 1
optimizer = torch.optim.Adam(net.parameters(), lr=learning_rate)

for epoch in range(num_epochs):
    if epoch % 20 == 0:
        print('Epoch {}'.format(epoch+1))
    x_train, y_train = shuffle(x_train, y_train)
    # Mini batch learning - mini batch since batch size < n(batch gradient descent), but > 1 (stochastic gradient descent)
    for i in range(batch_no):
        start = i * batch_size
        end = start + batch_size
        x_var = Variable(torch.FloatTensor(x_train[start:end]))
        y_var = Variable(torch.LongTensor(y_train[start:end]))
        # Forward + Backward + Optimize
        optimizer.zero_grad()
        ypred_var = net(x_var)
        loss =criterion(ypred_var, y_var)
        loss.backward()
        optimizer.step()

I lastly test my model on a separate test set.我最后在一个单独的测试集上测试我的模型。

I came across an approach using randomised search to tune the hyper-parameters as well as implementing K-fold cross-validation ( RandomizedSearchCV ).我遇到了一种使用随机搜索来调整超参数以及实现 K 折交叉验证 ( RandomizedSearchCV ) 的方法。

My question is two-fold(no pun intended!) and firstly is theoretical: Is k-fold validation is necessary or could add any benefit to mini-batch feed-forward neural network?我的问题是双重的（没有双关语！）首先是理论上的：k 折验证是必要的还是可以为小批量前馈神经网络增加任何好处？ From what I can see, the mini-batch approach should do roughly the same job, stopping over-fitting.据我所知，小批量方法应该做大致相同的工作，停止过度拟合。

I also found a good answer here but I'm not sure this addresses a mini-batch approach approach specifically.我也在这里找到了一个很好的答案，但我不确定这是否专门解决了小批量方法。

Secondly, if k-fold is not necessary, is there another hyper-parameter tuning function for PyTorch to avoid manually creating one?其次，如果不需要k-fold， PyTorch是否有另一个超参数调整功能来避免手动创建一个？

Answer 1

k-fold cross validation is generally useful when you have a very small dataset.当数据集非常小时，k 折交叉验证通常很有用。 Thus, if you are training on a dataset like CIFAR10 (which is large, 60000 images), then you don't require k-fold cross validation.因此，如果您在像 CIFAR10（很大，60000 张图像）这样的数据集上进行训练，那么您不需要 k 折交叉验证。
The idea of k-fold cross validation is to see how model performance (generalization) varies as different subsets of data is used for training and testing. k 折交叉验证的想法是查看模型性能（泛化）如何随着不同的数据子集用于训练和测试而变化。 This becomes important when you have very less data.当您的数据非常少时，这变得很重要。 However, for large datasets, the metric results on the test dataset is enough to test the generalization of the model.但是，对于大型数据集，测试数据集上的度量结果足以测试模型的泛化性。
Thus, whether you require k-fold cross validation depends on the size of your dataset.因此，是否需要 k 折交叉验证取决于数据集的大小。 It does not depend on what model you use.这与您使用的型号无关。
If you look at this chapter of the Deep Learning book (this was first referenced in this link ):如果你看一下深度学习这本书的这一章（这是在这个链接中第一次引用的）：

Small batches can oﬀer a regularizing eﬀect (Wilson and Martinez, 2003), perhaps due to the noise they add to the learning process.小批量可以提供正则化效果（Wilson 和 Martinez，2003），这可能是由于它们添加到学习过程中的噪音。 Generalization error is often best for a batch size of 1. Training with such a small batch size might require a small learning rate to maintain stability because of the high variance in the estimate of the gradient.泛化误差通常最适合批量大小为 1 的情况。使用如此小的批量大小进行训练可能需要很小的学习率来保持稳定性，因为梯度估计的差异很大。 The total runtime can be very high as a result of the need to make more steps, both because of the reduced learning rate and because it takes more steps to observe the entire training set.由于需要执行更多步骤，因此总运行时间可能非常高，这既是因为学习率降低，又是因为需要更多步骤来观察整个训练集。

So, yes, mini-batch training will have a regularizing effect (reduce overfitting) to some extent.所以，是的，小批量训练在一定程度上会产生正则化效果（减少过拟合）。
There is no inbuilt hyperparameter tuning (at least at the time of writing this answer), but many developers have developed tools for this purpose ( for example ).没有内置的超参数调整（至少在撰写此答案时），但许多开发人员为此目的开发了工具（例如）。 You can find more such tools by searching for them.您可以通过搜索找到更多此类工具。 This question has answers which list a lot of such tools. 这个问题的答案列出了很多这样的工具。

使用前馈神经网络进行超参数调整和过拟合 - Mini-Batch Epoch 和交叉验证

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-03-26 06:04:43

使用前馈神经网络进行超参数调整和过拟合 - Mini-Batch Epoch 和交叉验证

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-03-26 06:04:43

解决方案1
0 已采纳 2020-03-26 06:04:43