简体   繁体   English

使用前馈神经网络进行超参数调整和过拟合 - Mini-Batch Epoch 和交叉验证

[英]Hyper-parameter tuning and Over-fitting with Feed-Forward Neural Network - Mini-Batch Epoch and Cross Validation

I am looking at implementing a hyper-parameter tuning method for a feed-forward neural network (FNN) implemented using PyTorch .我正在考虑为使用PyTorch实现的前馈神经网络 (FNN) 实现超参数调整方法。 My original FNN , the model is named net , has been implemented using a mini-batch learning approach with epochs:我最初的 FNN 模型名为net ,已使用具有 epochs 的小批量学习方法实现:

#Parameters
batch_size = 50 #larger batch size leads to over fitting
num_epochs = 1000 
learning_rate = 0.01 #was .01-AKA step size - The amount that the weights are updated during training
batch_no = len(x_train) // batch_size 

criterion = nn.CrossEntropyLoss()  #performance of a classification model whose output is a probability value between 0 and 1
optimizer = torch.optim.Adam(net.parameters(), lr=learning_rate)

for epoch in range(num_epochs):
    if epoch % 20 == 0:
        print('Epoch {}'.format(epoch+1))
    x_train, y_train = shuffle(x_train, y_train)
    # Mini batch learning - mini batch since batch size < n(batch gradient descent), but > 1 (stochastic gradient descent)
    for i in range(batch_no):
        start = i * batch_size
        end = start + batch_size
        x_var = Variable(torch.FloatTensor(x_train[start:end]))
        y_var = Variable(torch.LongTensor(y_train[start:end]))
        # Forward + Backward + Optimize
        optimizer.zero_grad()
        ypred_var = net(x_var)
        loss =criterion(ypred_var, y_var)
        loss.backward()
        optimizer.step()

I lastly test my model on a separate test set.我最后在一个单独的测试集上测试我的模型。

I came across an approach using randomised search to tune the hyper-parameters as well as implementing K-fold cross-validation ( RandomizedSearchCV ).我遇到了一种使用随机搜索来调整超参数以及实现 K 折交叉验证 ( RandomizedSearchCV ) 的方法。

My question is two-fold(no pun intended!) and firstly is theoretical: Is k-fold validation is necessary or could add any benefit to mini-batch feed-forward neural network?我的问题是双重的(没有双关语!)首先是理论上的:k 折验证是必要的还是可以为小批量前馈神经网络增加任何好处? From what I can see, the mini-batch approach should do roughly the same job, stopping over-fitting.据我所知,小批量方法应该做大致相同的工作,停止过度拟合。

I also found a good answer here but I'm not sure this addresses a mini-batch approach approach specifically.我也在这里找到了一个很好的答案但我不确定这是否专门解决了小批量方法。

Secondly, if k-fold is not necessary, is there another hyper-parameter tuning function for PyTorch to avoid manually creating one?其次,如果不需要k-fold, PyTorch是否有另一个超参数调整功能来避免手动创建一个?

  • k-fold cross validation is generally useful when you have a very small dataset.当数据集非常小时,k 折交叉验证通常很有用。 Thus, if you are training on a dataset like CIFAR10 (which is large, 60000 images), then you don't require k-fold cross validation.因此,如果您在像 CIFAR10(很大,60000 张图像)这样的数据集上进行训练,那么您不需要 k 折交叉验证。
  • The idea of k-fold cross validation is to see how model performance (generalization) varies as different subsets of data is used for training and testing. k 折交叉验证的想法是查看模型性能(泛化)如何随着不同的数据子集用于训练和测试而变化。 This becomes important when you have very less data.当您的数据非常少时,这变得很重要。 However, for large datasets, the metric results on the test dataset is enough to test the generalization of the model.但是,对于大型数据集,测试数据集上的度量结果足以测试模型的泛化性。
  • Thus, whether you require k-fold cross validation depends on the size of your dataset.因此,是否需要 k 折交叉验证取决于数据集的大小。 It does not depend on what model you use.这与您使用的型号无关。
  • If you look at this chapter of the Deep Learning book (this was first referenced in this link ):如果你看一下深度学习这本书的这一章(这是在这个链接中第一次引用的):

Small batches can offer a regularizing effect (Wilson and Martinez, 2003), perhaps due to the noise they add to the learning process.小批量可以提供正则化效果(Wilson 和 Martinez,2003),这可能是由于它们添加到学习过程中的噪音。 Generalization error is often best for a batch size of 1. Training with such a small batch size might require a small learning rate to maintain stability because of the high variance in the estimate of the gradient.泛化误差通常最适合批量大小为 1 的情况。使用如此小的批量大小进行训练可能需要很小的学习率来保持稳定性,因为梯度估计的差异很大。 The total runtime can be very high as a result of the need to make more steps, both because of the reduced learning rate and because it takes more steps to observe the entire training set.由于需要执行更多步骤,因此总运行时间可能非常高,这既是因为学习率降低,又是因为需要更多步骤来观察整个训练集。

  • So, yes, mini-batch training will have a regularizing effect (reduce overfitting) to some extent.所以,是的,小批量训练在一定程度上会产生正则化效果(减少过拟合)。
  • There is no inbuilt hyperparameter tuning (at least at the time of writing this answer), but many developers have developed tools for this purpose ( for example ).没有内置的超参数调整(至少在撰写此答案时),但许多开发人员为此目的开发了工具(例如)。 You can find more such tools by searching for them.您可以通过搜索找到更多此类工具。 This question has answers which list a lot of such tools. 这个问题的答案列出了很多这样的工具。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用 GridSearchCV 进行神经网络的超参数调优 - Hyper-parameter Tuning Using GridSearchCV for Neural Network 功能 API 链接前馈网络和卷积神经网络 - Functional API Linking Feed-Forward Networks and Convolutional neural network 如何使用smac进行卷积神经网络的超参数优化? - How to use smac for hyper-parameter optimization of Convolution Neural Network? 机器学习的超参数调优 model - Hyper-parameter Tuning for a machine learning model 在超参数调整期间,简单参数是否也会更改 - Does the simple parameters also change during Hyper-parameter tuning 如何对庞大的数据集进行交叉验证和超参数调整? - how to do cross validation and hyper parameter tuning for huge dataset? 为什么在 Keras 中使用前馈神经网络进行单独的训练、验证和测试数据集可以获得 100% 的准确率? - Why am I getting 100% accuracy using feed-forward neural networks for separate training, validation, and testing datasets in Keras? Keras 简单前馈网络输入形状误差 - Keras simple feed-forward network input shape error 如何使用 GridSearchCV 比较多个模型以及 python 中的管道和超参数调整 - How to use GridSearchCV for comparing multiple models along with pipeline and hyper-parameter tuning in python 使用 Keras-tuner 进行超参数调整时关于“准确性”的错误 - Error regarding "accuracy" in hyper-parameter tuning using Keras-tuner
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM