简体繁体 English

克服机器学习模型的早期收敛的最佳方法

[英]Best Way to Overcome Early Convergence for Machine Learning Model

原文 2019-05-03 15:53:29 5 2 python-3.x/ machine-learning/ pytorch/ gradient-descent

I have a machine learning model built that tries to predict weather data, and in this case I am doing a prediction on whether or not it will rain tomorrow (a binary prediction of Yes/No). 我建立了一个机器学习模型，该模型试图预测天气数据，在这种情况下，我正在预测明天是否会下雨（是/否的二进制预测）。

In the dataset there is about 50 input variables, and I have 65,000 entries in the dataset. 在数据集中，大约有50个输入变量，我在数据集中有65,000个条目。

I am currently running a RNN with a single hidden layer, with 35 nodes in the hidden layer. 我目前正在运行具有单个隐藏层的RNN，该隐藏层中有35个节点。 I am using PyTorch's NLLLoss as my loss function, and Adaboost for the optimization function. 我将PyTorch的NLLLoss用作损失函数，并将Adaboost用作优化函数。 I've tried many different learning rates, and 0.01 seems to be working fairly well. 我尝试了许多不同的学习率，而0.01似乎工作得很好。

After running for 150 epochs, I notice that I start to converge around .80 accuracy for my test data. 运行150个纪元后，我发现我的测试数据的精度开始趋近于.80。 However, I would wish for this to be even higher. 但是，我希望这个数字更高。 However, it seems like the model is stuck oscillating around some sort of saddle or local minimum. 但是，似乎模型陷入了某种形式的鞍形或局部最小值波动。 (A graph of this is below) （下面的图表）

What are the most effective ways to get out of this "valley" that the model seems to be stuck in? 摆脱模型似乎陷入的“谷底”的最有效方法是什么？

2 个解决方案

Not sure why exactly you are using only one hidden layer and what is the shape of your history data but here are the things you can try: 不知道为什么只使用一个隐藏层以及历史数据的形状是什么，但是可以尝试以下操作：

Try more than one hidden layer 尝试多个隐藏层
Experiment with LSTM and GRU layer and combination of these layers together with RNN. 使用LSTM和GRU层进行实验，并将这些层与RNN结合使用。
Shape of your data ie the history you look at to predict the weather. 数据的形状，即您用来预测天气的历史记录。
Make sure your features are scaled properly since you have about 50 input variables. 由于您大约有50个输入变量，因此请确保正确缩放要素。

Your question is little ambiguous as you mentioned RNN with a single hidden layer. 您提到的带有单个隐藏层的RNN时，您的问题有点模棱两可。 Also without knowing the entire neural network architecture, it is tough to say how can you bring in improvements. 同样，在不了解整个神经网络架构的情况下，很难说出如何进行改进。 So, I would like to add a few points. 因此，我想补充几点。

You mentioned that you are using "Adaboost" as the optimization function but PyTorch doesn't have any such optimizer. 您提到您使用“ Adaboost”作为优化功能，但是PyTorch没有任何此类优化器。 Did you try using SGD or Adam optimizers which are very useful? 您是否尝试过使用非常有用的SGD或Adam优化器？
Do you have any regularization term in the loss function? 损失函数中是否有任何正则化项？ Are you familiar with dropout? 您熟悉辍学吗？ Did you check the training performance? 您检查了训练表现吗？ Does your model overfit? 您的模型是否过拟合？
Do you have a baseline model/algorithm so that you can compare whether 80% accuracy is good or not? 您是否有基线模型/算法，以便可以比较80％的准确性是否良好？

150 epochs just for a binary classification task looks too much. 仅用于二进制分类任务的150个纪元看起来太多了。 Why don't you start from an off-the-shelf classifier model? 您为什么不从现成的分类器模型开始？ You can find several examples of regression, classification in this tutorial . 您可以在本教程中找到几个回归，分类的示例。