带有R的MxNet：简单的XOR神经网络无法学习

Question

i am wanted to experiment with the MxNet library and built a simple neural network which learns the XOR function. 我想尝试使用MxNet库并建立一个学习XOR函数的简单神经网络。 I am facing the problem, that the model is not learning. 我面临的问题是该模型没有学习。

Here is the complete script: 这是完整的脚本：

library(mxnet)

train = matrix(c(0,0,0,
                 0,1,1,
                 1,0,1,
                 1,1,0),
               nrow=4,
               ncol=3,
               byrow=TRUE)

train.x = train[,-3]
train.y = train[,3]

data <- mx.symbol.Variable("data")
fc1 <- mx.symbol.FullyConnected(data, name="fc1", num_hidden=2)
act1 <- mx.symbol.Activation(fc1, name="relu1", act_type="relu")
fc2 <- mx.symbol.FullyConnected(act1, name="fc2", num_hidden=3)
act2 <- mx.symbol.Activation(fc2, name="relu2", act_type="relu")
fc3 <- mx.symbol.FullyConnected(act2, name="fc3", num_hidden=1)
softmax <- mx.symbol.SoftmaxOutput(fc3, name="sm")

mx.set.seed(0)
model <- mx.model.FeedForward.create(
  softmax,
  X = t(train.x),
  y = train.y,
  num.round = 10,
  array.layout = "columnmajor",
  learning.rate = 0.01,
  momentum = 0.4,
  eval.metric = mx.metric.accuracy,
  epoch.end.callback = mx.callback.log.train.metric(100))

predict(model,train.x,array.layout="rowmajor")

An this output is produced: 产生此输出：

Start training with 1 devices
[1] Train-accuracy=NaN
[2] Train-accuracy=0.5
[3] Train-accuracy=0.5
[4] Train-accuracy=0.5
[5] Train-accuracy=0.5
[6] Train-accuracy=0.5
[7] Train-accuracy=0.5
[8] Train-accuracy=0.5
[9] Train-accuracy=0.5
[10] Train-accuracy=0.5

> predict(model,train.x,array.layout="rowmajor")
[,1] [,2] [,3] [,4]
[1,]    1    1    1    1

How should I use mxnet to get this example working? 我应该如何使用mxnet来使此示例正常工作？

Regards, vaka 问候，瓦卡

Answer 1

Usually activation layer doesn't go right after input as it should be activated once the first layer's calculation is done. 通常，激活层在输入后不会立即进行，因为一旦完成第一层的计算就应激活它。 You can still achieve imitating XOR function with your old code, but it needs a few tweaks: 您仍然可以使用旧代码来模仿XOR函数，但是需要进行一些调整：

You are right that you need to initialize weights. 正确，您需要初始化权重。 It is a big discussion in Deep Learning community which initial weights are the best, but from my practice Xavier weights are working well 深度学习社区中有一个很大的讨论，哪些初始权重是最好的，但是根据我的实践，Xavier权重效果很好
If you want to use softmax, you need to change the last hidden layer units quantity to 2, because you have 2 classes: 0 and 1 如果要使用softmax，则需要将最后一个隐藏层单位数量更改为2，因为您有2个类别：0和1

After doing these 2 things + few minor optimizations, like removing transposing of the matrix, with the following code: 完成上述两项操作后，再进行一些次要的优化，例如使用以下代码删除矩阵的转置：

library(mxnet)

train = matrix(c(0,0,0,
                 0,1,1,
                 1,0,1,
                 1,1,0),
               nrow=4,
               ncol=3,
               byrow=TRUE)

train.x = train[,-3]
train.y = train[,3]

data <- mx.symbol.Variable("data")
fc1 <- mx.symbol.FullyConnected(data, name="fc1", num_hidden=2)
act1 <- mx.symbol.Activation(fc1, name="relu1", act_type="relu")
fc2 <- mx.symbol.FullyConnected(act1, name="fc2", num_hidden=3)
act2 <- mx.symbol.Activation(fc2, name="relu2", act_type="relu")
fc3 <- mx.symbol.FullyConnected(act2, name="fc3", num_hidden=2)
softmax <- mx.symbol.Softmax(fc3, name="sm")

mx.set.seed(0)
model <- mx.model.FeedForward.create(
  softmax,
  X = train.x,
  y = train.y,
  num.round = 50,
  array.layout = "rowmajor",
  learning.rate = 0.1,
  momentum = 0.99,
  eval.metric = mx.metric.accuracy,
  initializer = mx.init.Xavier(rnd_type = "uniform", factor_type = "avg", magnitude = 3),
  epoch.end.callback = mx.callback.log.train.metric(100))

predict(model,train.x,array.layout="rowmajor")

We get the following results: 我们得到以下结果：

Start training with 1 devices
[1] Train-accuracy=NaN
[2] Train-accuracy=0.75
[3] Train-accuracy=0.5
[4] Train-accuracy=0.5
[5] Train-accuracy=0.5
[6] Train-accuracy=0.5
[7] Train-accuracy=0.5
[8] Train-accuracy=0.5
[9] Train-accuracy=0.5
[10] Train-accuracy=0.75
[11] Train-accuracy=0.75
[12] Train-accuracy=0.75
[13] Train-accuracy=0.75
[14] Train-accuracy=0.75
[15] Train-accuracy=0.75
[16] Train-accuracy=0.75
[17] Train-accuracy=0.75
[18] Train-accuracy=0.75
[19] Train-accuracy=0.75
[20] Train-accuracy=0.75
[21] Train-accuracy=0.75
[22] Train-accuracy=0.5
[23] Train-accuracy=0.5
[24] Train-accuracy=0.5
[25] Train-accuracy=0.75
[26] Train-accuracy=0.75
[27] Train-accuracy=0.75
[28] Train-accuracy=0.75
[29] Train-accuracy=0.75
[30] Train-accuracy=0.75
[31] Train-accuracy=0.75
[32] Train-accuracy=0.75
[33] Train-accuracy=0.75
[34] Train-accuracy=0.75
[35] Train-accuracy=0.75
[36] Train-accuracy=0.75
[37] Train-accuracy=0.75
[38] Train-accuracy=0.75
[39] Train-accuracy=1
[40] Train-accuracy=1
[41] Train-accuracy=1
[42] Train-accuracy=1
[43] Train-accuracy=1
[44] Train-accuracy=1
[45] Train-accuracy=1
[46] Train-accuracy=1
[47] Train-accuracy=1
[48] Train-accuracy=1
[49] Train-accuracy=1
[50] Train-accuracy=1
> 
> predict(model,train.x,array.layout="rowmajor")
          [,1]         [,2]         [,3]         [,4]
[1,] 0.9107883 2.618128e-06 6.384078e-07 0.9998743534
[2,] 0.0892117 9.999974e-01 9.999994e-01 0.0001256234
'''

The output of softmax is interpreted as "a probability of belonging to a class" - it is not a "0" or "1" value as one gets after doing regular math. softmax的输出被解释为“属于一类的概率”-它不是在进行常规数学运算后得到的“ 0”或“ 1”值。 The answer means the following: 答案的含义如下：

In case "0 and 0": probability of class "0" = 0.9107883 and of class "1" = 0.0892117, meaning it is 0 在“ 0和0”的情况下：类别“ 0”的概率= 0.9107883，类别“ 1”的概率= 0.0892117，表示为0
In case "0 and 1": probability of class "0" = 2.618128e-06 and of class "1" = 9.999974e-01, meaning it is 1 (probability of 1 is much higher) 在“ 0和1”的情况下：类“ 0”的概率为2.618128e-06，类“ 1”的概率为9.999974e-01，这意味着它为1（概率为1的概率要高得多）
In case "1 and 0": probability of class "0" = 6.384078e-07 and of class "1" = 9.999994e-01 (probability of 1 is much higher) 在“ 1和0”的情况下：类“ 0”的概率为6.384078e-07，类“ 1”的概率为9.999994e-01（概率为1的概率要高得多）
In case "1 and 1": probability of class "0" = 0.9998743534 and of class "1" = 0.0001256234, meaning it is 0. 在“ 1和1”的情况下：类别“ 0”的概率= 0.9998743534，类别“ 1”的概率= 0.0001256234，表示为0。

Answer 2

Ok, I tried a little bit more and now I have a working example of XOR with mxnet in R. The complicated part is not the mxnet API but the usage of neural networks, instead. 好的，我尝试了一些，现在我有了一个在R中使用mxnet的XOR的工作示例。复杂的部分不是mxnet API，而是神经网络的用法。

So here is the working R-code: 因此，这是有效的R代码：

library(mxnet)

train = matrix(c(0,0,0,
                 0,1,1,
                 1,0,1,
                 1,1,0),
               nrow=4,
               ncol=3,
               byrow=TRUE)

train.x = t(train[,-3])
train.y = t(train[,3])

data <- mx.symbol.Variable("data")
act0 <- mx.symbol.Activation(data, name="relu1", act_type="relu")
fc1 <- mx.symbol.FullyConnected(act0, name="fc1", num_hidden=2)
act1 <- mx.symbol.Activation(fc1, name="relu2", act_type="tanh")
fc2 <- mx.symbol.FullyConnected(act1, name="fc2", num_hidden=3)
act2 <- mx.symbol.Activation(fc2, name="relu3", act_type="relu")
fc3 <- mx.symbol.FullyConnected(act2, name="fc3", num_hidden=1)
act3 <- mx.symbol.Activation(fc3, name="relu4", act_type="relu")
softmax <- mx.symbol.LinearRegressionOutput(act3, name="sm")

mx.set.seed(0)
model <- mx.model.FeedForward.create(
  softmax,
  X = train.x,
  y = train.y,
  num.round = 10000,
  array.layout = "columnmajor",
  learning.rate = 10^-2,
  momentum = 0.95,
  eval.metric = mx.metric.rmse,
  epoch.end.callback = mx.callback.log.train.metric(10),
  lr_scheduler=mx.lr_scheduler.FactorScheduler(1000,factor=0.9),
  initializer=mx.init.uniform(0.5)
  )

predict(model,train.x,array.layout="columnmajor")

There are some differences to the initial code: 初始代码有一些区别：

I changed the layout of the neural net by putting another activation layer between the data and the first layer. 我通过在数据和第一层之间放置另一个激活层来更改了神经网络的布局。 I was interpreting this as putting weights between the data and the input layer (is that correct?) 我将其解释为将权重放在数据和输入层之间（对吗？）
I changed the activation function of the hidden layer (with 3 neurons) to tanh, because I guess that negative weights are needed for XOR 我将隐藏层（具有3个神经元）的激活功能更改为tanh，因为我想XOR需要负权重
I changed SoftmaxOutput to LinearRegressionOutput as to optimize for square loss instead 我将SoftmaxOutput更改为LinearRegressionOutput以优化平方损耗
Fine tuned learning rate and momentum 微调的学习速度和动力
Most important: I added the uniform initializer for the weights. 最重要的是：我为权重添加了统一的初始化器。 I guess the default mode is setting weights to zero. 我猜默认模式是将权重设置为零。 Learning is really speed up when using randomized init weights. 当使用随机初始化权重时，学习确实可以加速。

The output: 输出：

Start training with 1 devices
[1] Train-rmse=NaN
[2] Train-rmse=0.706823888574888
[3] Train-rmse=0.705537411582449
[4] Train-rmse=0.701298592443344
[5] Train-rmse=0.691897326795625
...
[9999] Train-rmse=1.07453801496744e-07
[10000] Train-rmse=1.07453801496744e-07
> predict(model,train.x,array.layout="columnmajor")
     [,1]      [,2] [,3] [,4]
[1,]    0 0.9999998    1    0

带有R的MxNet：简单的XOR神经网络无法学习

问题描述

2 个解决方案

解决方案1
1 已采纳 2018-02-17 00:01:37

解决方案2
0 2018-02-13 22:40:05

带有R的MxNet：简单的XOR神经网络无法学习

问题描述

2 个解决方案

解决方案1 1 已采纳 2018-02-17 00:01:37

解决方案2 0 2018-02-13 22:40:05

解决方案1
1 已采纳 2018-02-17 00:01:37

解决方案2
0 2018-02-13 22:40:05