[英]Why can't I learn XOR function with this network and constraints?
Let's say I have the following constraints and the network:假设我有以下约束和网络:
I tried to implement this in pytorch with various initialization schemes and different data sets but I failed (the code is at the bottom).我尝试在 pytorch 中使用各种初始化方案和不同的数据集来实现这一点,但我失败了(代码在底部)。
My questions are:我的问题是:
Code:代码:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.utils.data as data_utils
import numpy as np
class Network(nn.Module):
def __init__(self):
super(Network, self).__init__()
self.fc1 = nn.Linear(2,2,bias=False)
self.fc2 = nn.Linear(2,1, bias=False)
self.rl = nn.ReLU()
def forward(self, x):
x = self.fc1(x)
x = self.rl(x)
x = self.fc2(x)
return x
#create an XOR data set to train
rng = np.random.RandomState(0)
X = rng.randn(200, 2)
y = np.logical_xor(X[:, 0] > 0, X[:, 1] > 0).astype('int32')
# test data set
X_test = np.array([[0,0],[0,1], [1,0], [1,1]])
train = data_utils.TensorDataset(torch.from_numpy(X).float(), \
torch.from_numpy(y).float())
train_loader = data_utils.DataLoader(train, batch_size=50, shuffle=True)
test = torch.from_numpy(X_test).float()
# training the network
num_epoch = 10000
net = Network()
net.fc1.weight.data.clamp_(min=-1, max=1)
net.fc2.weight.data.clamp_(min=-1, max=1)
# define loss and optimizer
criterion = nn.MSELoss()
optimizer = optim.Adam(net.parameters())
for epoch in range(num_epoch):
running_loss = 0 # loss per epoch
for (X, y)in train_loader:
# make the grads zero
optimizer.zero_grad()
# forward propagate
out = net(X)
# calculate loss and update
loss = criterion(out, y)
loss.backward()
optimizer.step()
running_loss += loss.data
if epoch%500== 0:
print("Epoch: {0} Loss: {1}".format(epoch, running_loss))
The loss doesn't improve.损失没有改善。 It gets stuck in some value after a few epochs ( i'm not sure how to make this reproducible as I'm getting different values every time)
它在几个时期后陷入了某个值(我不确定如何使这个可重现,因为我每次都得到不同的值)
net(test)
returns a set of predictions that are no way close to XOR output. net(test)
返回一组与 XOR 输出相差无几的预测。
You need to use a nonlinear activation function such as sigmoid in your hidden and output layers .您需要在隐藏层和输出层中使用非线性激活函数,例如 sigmoid。 because xor is not linearly separable.Also biases are required.
因为 xor 不是线性可分的。还需要偏差。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.