简体   繁体   English

为什么我不能用这个网络和约束学习 XOR 函数?

[英]Why can't I learn XOR function with this network and constraints?

Let's say I have the following constraints and the network:假设我有以下约束和网络:

  1. The architecture is fixed ( see this image ) (note that there are no biases)架构是固定的(见这张图片)(注意没有偏见)
  2. Activation function for the hidden layer is ReLU隐藏层的激活函数是 ReLU
  3. There's no activation function for the output layer (should just return the sum of the inputs it receive).输出层没有激活函数(应该只返回它接收到的输入的总和)。

I tried to implement this in pytorch with various initialization schemes and different data sets but I failed (the code is at the bottom).我尝试在 pytorch 中使用各种初始化方案和不同的数据集来实现这一点,但我失败了(代码在底部)。

My questions are:我的问题是:

  1. Is there anything wrong with my NN training process?我的 NN 训练过程有什么问题吗?
  2. Is this a feasible problem?这是一个可行的问题吗? If yes, how?如果是,如何?
  3. If this is doable, can we still achieve that by constraining the weights to be in the set {-1, 0, 1}如果这是可行的,我们仍然可以通过将权重限制在集合 {-1, 0, 1} 中来实现吗?

Code:代码:

import torch
import torch.nn as nn
import torch.optim as optim
import torch.utils.data as data_utils
import numpy as np

class Network(nn.Module):

     def __init__(self):

        super(Network, self).__init__()
        self.fc1 = nn.Linear(2,2,bias=False)
        self.fc2 = nn.Linear(2,1, bias=False)
        self.rl = nn.ReLU()

      def forward(self, x):
        x = self.fc1(x)
        x = self.rl(x)
        x = self.fc2(x)
        return x 

#create an XOR data set to train    
rng = np.random.RandomState(0)
X = rng.randn(200, 2)
y = np.logical_xor(X[:, 0] > 0, X[:, 1] > 0).astype('int32')

# test data set
X_test = np.array([[0,0],[0,1], [1,0], [1,1]])

train = data_utils.TensorDataset(torch.from_numpy(X).float(), \
                         torch.from_numpy(y).float())
train_loader = data_utils.DataLoader(train, batch_size=50, shuffle=True)

test = torch.from_numpy(X_test).float()

# training the network
num_epoch = 10000

net = Network()
net.fc1.weight.data.clamp_(min=-1, max=1)
net.fc2.weight.data.clamp_(min=-1, max=1)

# define loss and optimizer
criterion = nn.MSELoss()
optimizer = optim.Adam(net.parameters())

for epoch in range(num_epoch):
  running_loss = 0 # loss per epoch
  for (X, y)in train_loader:
    # make the grads zero
    optimizer.zero_grad()
    # forward propagate
    out = net(X)
    # calculate loss and update
    loss = criterion(out, y)
    loss.backward()
    optimizer.step()
    running_loss += loss.data
  if epoch%500== 0:      
      print("Epoch: {0} Loss: {1}".format(epoch, running_loss))

The loss doesn't improve.损失没有改善。 It gets stuck in some value after a few epochs ( i'm not sure how to make this reproducible as I'm getting different values every time)它在几个时期后陷入了某个值(我不确定如何使这个可重现,因为我每次都得到不同的值)

net(test) returns a set of predictions that are no way close to XOR output. net(test)返回一组与 XOR 输出相差无几的预测。

You need to use a nonlinear activation function such as sigmoid in your hidden and output layers .您需要在隐藏层和输出层中使用非线性激活函数,例如 sigmoid。 because xor is not linearly separable.Also biases are required.因为 xor 不是线性可分的。还需要偏差。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM