XOR 的 Tensorflow 在 500 个时期后无法正确预测

Question

I'm trying to implement a Neural Network to solve the XOR problem using TensorFlow.我正在尝试使用 TensorFlow 实现一个神经网络来解决 XOR 问题。 I chose sigmoid as activation function, shape (2, 2, 1) and optimizer=SGD() .我选择 sigmoid 作为激活函数，形状为(2, 2, 1)和optimizer=SGD() 。 I choose batch_size=1 because the universe of the problem is 4, so is really small.我选择batch_size=1是因为问题的范围是4，所以真的很小。 The problem is the predictions are not even close to the right answers.问题是预测甚至还没有接近正确的答案。 What am I doing wrong?我究竟做错了什么？

I'm doing this on Google Colab, and the Tensorflow version is 2.3.0.我在 Google Colab 上做这个，Tensorflow 版本是 2.3.0。

import tensorflow as tf
import numpy as np



x = np.array([[0, 0],
              [1, 1],
              [1, 0],
              [0, 1]],  dtype=np.float32)

y = np.array([[0], 
              [0], 
              [1], 
              [1]],     dtype=np.float32)



model =  tf.keras.models.Sequential()
model.add(tf.keras.Input(shape=(2,)))
model.add(tf.keras.layers.Dense(2, activation=tf.keras.activations.sigmoid))
model.add(tf.keras.layers.Dense(2, activation=tf.keras.activations.sigmoid))
model.add(tf.keras.layers.Dense(1, activation=tf.keras.activations.sigmoid))

model.compile(optimizer=tf.keras.optimizers.SGD(), 
              loss=tf.keras.losses.MeanSquaredError(), 
              metrics=['binary_accuracy'])

history = model.fit(x, y, batch_size=1, epochs=500, verbose=False)

print("Tensorflow version: ", tf.__version__)
predictions = model.predict_on_batch(x)
print(predictions)

The output:输出：

Tensorflow version:  2.3.0
WARNING:tensorflow:10 out of the last 10 calls to <function Model.make_predict_function.<locals>.predict_function at 0x7f69f7a83a60> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for  more details.
[[0.5090364 ]
[0.4890102 ]
[0.50011414]
[0.49678832]]

Answer 1

The problem is your learning rate and the way you are optimizing your weights问题在于你的学习率和优化权重的方式

Another factor to keep in mind when we are training is the step size that we take in the direction of the gradient.训练时要记住的另一个因素是我们在梯度方向上的步长。 If this step is too great, we can end up in a wrong position, jumping outside of our local minimum.如果这一步太大，我们最终可能会处于错误的位置，跳出我们的局部最小值。 If too small we could never reach the minimum.如果太小，我们永远无法达到最小值。

By default the Stochastic Gradient Descent (SGD) in keras, has a learning rate of 0.01.默认情况下，keras 中的随机梯度下降 (SGD) 的学习率为 0.01。 And this learning rate is fixed during training.并且这个学习率在训练期间是固定的。 If you check your training, the loss is moving too slow toward the global minimum or jumping to higher values sometimes.如果你检查你的训练，损失向全局最小值移动的速度太慢，或者有时会跳到更高的值。 For your specific problem, it's quite difficult to reach the minimum with a fixed learning rate, because you are not taking in consideration the loss function landscape.对于您的特定问题，很难以固定的学习率达到最小值，因为您没有考虑损失函数的情况。

For example, using Adam as optimizer algorithm and a learning_rate = 0.02 , I was able to reach an accuracy of 1例如，使用Adam作为优化器算法和learning_rate = 0.02 ，我能够达到 1 的准确度

import tensorflow as tf
import numpy as np

x = np.array([[0, 0],
              [1, 1],
              [1, 0],
              [0, 1]],  dtype=np.float32)

y = np.array([[0], 
              [0], 
              [1], 
              [1]],     dtype=np.float32)

model =  tf.keras.models.Sequential()
model.add(tf.keras.Input(shape=(2,)))
model.add(tf.keras.layers.Dense(2, activation=tf.keras.activations.sigmoid))
model.add(tf.keras.layers.Dense(2, activation=tf.keras.activations.sigmoid))
model.add(tf.keras.layers.Dense(1, activation=tf.keras.activations.sigmoid))

model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.02), # learning rate was 0.001 prior to this change
              loss=tf.keras.losses.MeanSquaredError(), 
              metrics=['mse', 'binary_accuracy'])
model.summary()
print("Tensorflow version: ", tf.__version__)
predictions = model.predict_on_batch(x)
print(predictions)history = model.fit(x, y, batch_size=1, epochs=500)

[[0.05162644]
[0.06670767]
[0.9240402 ]
[0.923379  ]]

I used Adam because it has an adaptive learning rate the which is tuned during training, depending on how the train is going.我使用 Adam 是因为它有一个自适应学习率，在训练期间会根据火车的运行情况进行调整。

If you use a greater learning rate (0.1), but using SGD, in the history training losses you can see that at one point the accuracy reach 1, but right after that it jumps to lower values.如果您使用更大的学习率 (0.1)，但使用 SGD，在历史训练损失中，您可以看到在某一时刻准确率达到 1，但紧接着它跳到较低的值。 That's because you have a fixed learning rate.那是因为你有一个固定的学习率。 Another strategy would be to stop the training when you reach that values with SGD, maybe with a keras callback .另一种策略是当您使用 SGD 达到该值时停止训练，也许使用 keras callback 。

Don't forget to tune your learning rate and to choose the right optimizer.不要忘记调整学习率并选择正确的优化器。 It's fundamental to obtain a fast training and a good minimum.获得快速培训和良好的最低限度是基础。

Also consider changing the net architecture (adding nodes, and use other activation functions for the hidden layers, like Relu)还要考虑改变网络架构（添加节点，并为隐藏层使用其他激活函数，如 Relu）

Here some useful details on how to handle the learning rate这里有一些关于如何处理学习率的有用细节

Answer 2

You are right.你是对的。 I changed a little bit the learning rate to 0.5 but still using SGD and epochs=10000.我将学习率稍微更改为 0.5，但仍然使用 SGD 和 epochs=10000。 I got the follwoing output:我得到了以下输出：

Tensorflow version:  2.3.0
[[0.00407344]
[0.00893608]
[0.9912169 ]
[0.99120843]]

Other training:其他培训：

[[0.0097596 ]
[0.00862199]
[0.99391216]
[0.98597133]]

Of course epoch=10000 is not what I would call fast training but I'm learning the stuff and I got what I wanted.当然 epoch=10000 不是我所说的快速训练，但我正在学习这些东西，我得到了我想要的。

Answer 3

I took the liberty to make a Google Colab notebook using some of your code and found out that your model is suffering from the so-called Vanishing Gradient Problem .我冒昧地使用您的一些代码制作了一个 Google Colab 笔记本，并发现您的模型遇到了所谓的消失梯度问题。 I fixed the issue by initializing the hidden layer kernel values using a constant of 0.5 like so:我通过使用常数0.5初始化隐藏层内核值来解决这个问题，如下所示：

model = tf.keras.models.Sequential()
model.add(tf.keras.Input(shape=(2,)))
model.add(tf.keras.layers.Dense(2, activation=tf.keras.activations.sigmoid, kernel_initializer=tf.initializers.Constant(0.5)))
model.add(tf.keras.layers.Dense(1, activation=tf.keras.activations.sigmoid))

This minimizes the chances of the vanishing gradient problem of ever occurring with the value of 0.5 being equidistant from the gradients' vanishing values of 0.0 and 1.0 for the sigmoid activation function.对于 sigmoid 激活函数，当值0.5与梯度的消失值0.0和1.0等距时，这可以最大限度地减少消失梯度问题的发生几率。

I also found that the Adam optimizer performs better than the SGD one just as pointed out in the accepted answer.我还发现Adam优化器的性能优于SGD优化器，正如已接受的答案中所指出的那样。

You can visit my Colab notebook here:你可以在这里访问我的 Colab 笔记本：

https://colab.research.google.com/drive/1gv-z-C9TpKAsnAyBYLmwRvk6NaAtlr_C?usp=sharing https://colab.research.google.com/drive/1gv-z-C9TpKAsnAyBYLmwRvk6NaAtlr_C?usp=sharing

XOR 的 Tensorflow 在 500 个时期后无法正确预测

问题描述

3 个解决方案

解决方案1
4 已采纳 2020-11-04 21:56:55

解决方案2
0 2020-11-06 14:33:07

解决方案3
0 2022-01-02 22:35:53

XOR 的 Tensorflow 在 500 个时期后无法正确预测

问题描述

3 个解决方案

解决方案1 4 已采纳 2020-11-04 21:56:55

解决方案2 0 2020-11-06 14:33:07

解决方案3 0 2022-01-02 22:35:53

解决方案1
4 已采纳 2020-11-04 21:56:55

解决方案2
0 2020-11-06 14:33:07

解决方案3
0 2022-01-02 22:35:53