简体繁体 English

神经网络训练期间什么时候应该调用反向传播算法？

[英]When should Back-propagation algorithm be called during Neural Network training?

原文 2021-10-26 14:23:35 0 2 python/ tensorflow/ machine-learning/ deep-learning/ neural-network

I have a working back-propagation algorithm that correctly minimizes the error when iterated 100,000 times over the same singular input , for example [ 1, 0 ] -> 1.我有一个有效的反向传播算法，当在相同的奇异输入上迭代 100,000 次时，它可以正确地最小化错误，例如 [1, 0] -> 1。

But I am not sure how to extend this to train the neural network when there are multiple inputs.但是当有多个输入时，我不确定如何扩展它来训练神经网络。

Suppose we wish to train the XOR function, with four possible input and output states:假设我们希望训练 XOR 函数，具有四种可能的输入和输出状态：

[ 0, 0 ] -> 0 [ 0, 0 ] -> 0

[ 0, 1 ] -> 1 [ 0, 1 ] -> 1

[ 1, 0 ] -> 1 [ 1, 0 ] -> 1

[ 1, 1 ] -> 0 [ 1, 1 ] -> 0

I have tried calling the back-propagation algorithm after every single input-output test data.我尝试在每个输入输出测试数据之后调用反向传播算法。 The network doesn't learn at all in this fashion even over large number of iterations.即使经过大量迭代，网络也根本不会以这种方式学习。

Should I instead compute the accumulated error over the entire the training set (which is the 4 cases above) before calling back-propagation?在调用反向传播之前，我是否应该计算整个训练集（即上述 4 种情况）的累积误差？

How is the accumulated errors to be stored and used for the entire training set in this example?在这个例子中，累积的错误如何存储和用于整个训练集？

Thank you.谢谢你。

2 个解决方案

Both updating after every example, and accumulated versions are correct.每个示例后的更新和累积版本都是正确的。 They simply implement two slightly different algorithms, updating every step would make it an SGD (stochastic gradient descent) while the other GD (gradient descent).他们只是实现了两种略有不同的算法，更新每一步将使其成为 SGD（随机梯度下降），而另一个 GD（梯度下降）。 One can also do things in between, where you update every batch of data.你也可以在两者之间做一些事情，在那里你更新每一批数据。 The issues you are describing (lack of learning) have nothing to do with when the update takes place.您描述的问题（缺乏学习）与更新发生时无关。

Note, that "correctly learning" one sample does not mean you have a bug free algorithm!请注意，“正确学习”一个样本并不意味着您拥有无错误的算法！ A network where you only adjust a bias of the final layer should be able to do so if you only have one sample, but will fail for multiple.如果您只有一个样本，那么您只调整最后一层的偏差的网络应该能够这样做，但会失败。 This is just one example of what can be broken yet pass your "single sample test".这只是可以打破但通过“单样本测试”的一个例子。

If your model is a single-layer network, it will not be able to learn XOR function as it is linearly non-separable.如果您的模型是单层网络，它将无法学习 XOR 函数，因为它是线性不可分的。 If it has more than one layer, you should accumulate all errors and normalize them by the total number of all samples (in your case 4).如果它有多个层，您应该累积所有错误并按所有样本的总数（在您的情况下为 4）对它们进行归一化。 Finally, the main reason for your problem might be due to the high learning rate that makes the parameters change too much.最后，你的问题的主要原因可能是由于高学习率导致参数变化太大。 Try reducing the learning rate and increasing the number of iterations.尝试降低学习率并增加迭代次数。 See https://medium.com/analytics-vidhya/understanding-basics-of-deep-learning-by-solving-xor-problem-cb3ff6a18a06 for reference.请参阅https://medium.com/analytics-vidhya/understanding-basics-of-deep-learning-by-solving-xor-problem-cb3ff6a18a06以供参考。