简体   繁体   English

Pytorch 连体网络未收敛

[英]Pytorch Siamese Network not converging

Good morning everyone大家,早安

Below is my implementation of a pytorch siamese network.下面是我对 pytorch 连体网络的实现。 I am using 32 batch size, MSE loss and SGD with 0.9 momentum as optimizer.我使用 32 批大小、MSE 损失和具有 0.9 动量的 SGD 作为优化器。

class SiameseCNN(nn.Module):
    def __init__(self):
        super(SiameseCNN, self).__init__()                                      # 1, 40, 50
        self.convnet = nn.Sequential(nn.Conv2d(1, 8, 7), nn.ReLU(),             # 8, 34, 44
                                    nn.Conv2d(8, 16, 5), nn.ReLU(),             # 16, 30, 40
                                    nn.MaxPool2d(2, 2),                         # 16, 15, 20
                                    nn.Conv2d(16, 32, 3, padding=1), nn.ReLU(), # 32, 15, 20
                                    nn.Conv2d(32, 64, 3, padding=1), nn.ReLU()) # 64, 15, 20
        self.linear1 = nn.Sequential(nn.Linear(64 * 15 * 20, 100), nn.ReLU())
        self.linear2 = nn.Sequential(nn.Linear(100, 2), nn.ReLU())
        
    def forward(self, data):
        res = []
        for j in range(2):
            x = self.convnet(data[:, j, :, :])
            x = x.view(-1, 64 * 15 * 20)
            res.append(self.linear1(x))
        fres = abs(res[1] - res[0])
        return self.linear2(fres)

Each batch contains alternating pairs, ie [pos, pos], [pos, neg], [pos, pos] etc... However, the network doesn't converge, and the problem seems that fres in the network is the same for each pair (regardless of whether it is a positive or negative pair), and the output of self.linear2(fres) is always approximately equal to [0.0531, 0.0770] .每个batch都包含交替对,即[pos, pos], [pos, neg], [pos, pos] etc...但是,网络不收敛,问题似乎网络中的fres对于每对(无论是正对还是负对), self.linear2(fres)的 output 总是大约等于[0.0531, 0.0770] This is in contrast with what I am expecting, which is that the first value of [0.0531, 0.0770] would get closer to 1 for a positive pair as the network learns, and the second value would get closer to 1 for a negative pair.这与我的预期相反,即随着网络的学习, [0.0531, 0.0770]的第一个值对于正对将更接近 1,而对于负对,第二个值将更接近 1。 These two values also need to sum up to 1.这两个值也需要相加为 1。

I have tested exactly the same setup and same input images for a 2 channel network architecture, where, instead of feeding in [pos, pos] you would stack those 2 images in a depth-wise fashion, for example numpy.stack([pos, pos], -1) .我已经为 2 通道网络架构测试了完全相同的设置和相同的输入图像,其中,不是输入[pos, pos] ,而是以深度方式堆叠这两个图像,例如numpy.stack([pos, pos], -1) The dimension of nn.Conv2d(1, 8, 7) also changes to nn.Conv2d(2, 8, 7) in this setup.在此设置中, nn.Conv2d(1, 8, 7)的维度也更改为nn.Conv2d(2, 8, 7) This works perfectly fine.这工作得很好。

I have also tested exactly the same setup and input images for a traditional CNN approach, where I just pass in single positive and negative grey scale images into the network, instead of stacking them (as with the 2-CH approach) or passing them in as image pairs (as with the Siamese approach).我还为传统的 CNN 方法测试了完全相同的设置和输入图像,我只是将单个正和负灰度图像传递到网络中,而不是堆叠它们(如使用 2-CH 方法)或传递它们作为图像对(与连体方法一样)。 This also works perfectly, but the results are not so good as with the 2 channel approach.这也很有效,但结果不如 2 通道方法好。

EDIT (Solutions I've tried):编辑(我尝试过的解决方案):

def forward(self, data):
    res = []
    for j in range(2):
        x = self.convnet(data[:, j, :, :])
        x = x.view(-1, 64 * 15 * 20)
        res.append(x)
    fres = self.linear2(self.linear1(abs(res[1] - res[0]))))
    return fres 
def forward(self, data):
    res = []
    for j in range(2):
        x = self.convnet(data[:, j, :, :])
        res.append(x)
    pdist = nn.PairwiseDistance(p=2)
    diff = pdist(res[1], res[0])
    diff = diff.view(-1, 64 * 15 * 10)
    fres = self.linear2(self.linear1(diff))
    return fres

Another thing to note perhaps is that, within the context of my research, a Siamese network is trained for each object.另一件需要注意的事情可能是,在我的研究范围内,为每个 object 训练了一个连体网络。 So the first class is associated with the images containing the object in question, and the second class is associated with images containing other objects.因此,第一个 class 与包含相关 object 的图像相关联,第二个 class 与包含其他对象的图像相关联。 Don't know if this might be the cause of the problem.不知道这是否可能是问题的原因。 It is however not a problem within the context of the Traditional CNN and 2-Channel CNN approaches.然而,在传统 CNN 和 2 通道 CNN 方法的背景下,这不是问题。

As per request, here is my training code:根据要求,这是我的培训代码:

model = SiameseCNN().cuda()
ls_fn = torch.nn.BCELoss()
optim = torch.optim.SGD(model.parameters(),  lr=1e-6, momentum=0.9)
epochs = np.arange(100)
eloss = []
for epoch in epochs:
    model.train()
    train_loss = []
    for x_batch, y_batch in dp.train_set:
        x_var, y_var = Variable(x_batch.cuda()), Variable(y_batch.cuda())
        y_pred = model(x_var)
        loss = ls_fn(y_pred, y_var)
        train_loss.append(abs(loss.item()))
        optim.zero_grad()
        loss.backward()
        optim.step()
    eloss.append(np.mean(train_loss))
    print(epoch, np.mean(train_loss))

Note dp in dp.train_set is a class with attributes train_set, valid_set, test_set , where each set is created as follows:注意dp.train_set中的dp是一个 class 具有属性train_set, valid_set, test_set ,其中每个集合的创建方式如下:

DataLoader(TensorDataset(torch.Tensor(x), torch.Tensor(y)), batch_size=bs)

As per request, here is an example of the predicted probabilities vs true label, where you can see the model doesn't seem to be learning:根据请求,这是预测概率与真实 label 的示例,您可以在其中看到 model 似乎没有学习:

Predicted:  0.5030623078346252 Label:  1.0
Predicted:  0.5030624270439148 Label:  0.0
Predicted:  0.5030624270439148 Label:  1.0
Predicted:  0.5030625462532043 Label:  0.0
Predicted:  0.5030625462532043 Label:  1.0
Predicted:  0.5030626654624939 Label:  0.0
Predicted:  0.5030626058578491 Label:  1.0
Predicted:  0.5030627250671387 Label:  0.0
Predicted:  0.5030626654624939 Label:  1.0
Predicted:  0.5030627846717834 Label:  0.0
Predicted:  0.5030627250671387 Label:  1.0
Predicted:  0.5030627846717834 Label:  0.0
Predicted:  0.5030627250671387 Label:  1.0
Predicted:  0.5030628442764282 Label:  0.0
Predicted:  0.5030627846717834 Label:  1.0
Predicted:  0.5030628442764282 Label:  0.0

I think that your approach is correct and you are doing things fine.我认为您的方法是正确的,并且您做得很好。 What looks a bit weird to me is the last layer which has a RELU activation.对我来说有点奇怪的是最后一层有 RELU 激活。 Usually with Siamese networks you want to output a high probability when the two input images belong to the same class and a low probability otherwise.通常使用连体网络,当两个输入图像属于相同的 class 时,您希望 output 的概率很高,否则概率很低。 So you can implement this with a single neuron output and a sigmoid activation function.因此,您可以使用单个神经元 output 和 sigmoid 激活 function 来实现这一点。

Therefore I would reimplement your Network as follows:因此,我将按如下方式重新实现您的网络:

class SiameseCNN(nn.Module):
    def __init__(self):
        super(SiameseCNN, self).__init__()                                      # 1, 40, 50
        self.convnet = nn.Sequential(nn.Conv2d(1, 8, 7), nn.ReLU(),             # 8, 34, 44
                                    nn.Conv2d(8, 16, 5), nn.ReLU(),             # 16, 30, 40
                                    nn.MaxPool2d(2, 2),                         # 16, 15, 20
                                    nn.Conv2d(16, 32, 3, padding=1), nn.ReLU(), # 32, 15, 20
                                    nn.Conv2d(32, 64, 3, padding=1), nn.ReLU()) # 64, 15, 20
        self.linear1 = nn.Sequential(nn.Linear(64 * 15 * 20, 100), nn.ReLU())
        self.linear2 = nn.Sequential(nn.Linear(100, 1), nn.Sigmoid())
        
    def forward(self, data):
        for j in range(2):
            x = self.convnet(data[:, j, :, :])
            x = x.view(-1, 64 * 15 * 20)
            res.append(self.linear1(x))
        fres = res[0].sub(res[1]).pow(2)
        return self.linear2(fres)

Then to be consistent whith training you should use a binary crossentropy:然后为了与训练保持一致,您应该使用二元交叉熵:

criterion_fn = torch.nn.BCELoss()

And remember to set labels to 1 wehen both input images belong to the same class.当两个输入图像属于同一个 class 时,请记住将标签设置为 1。

Also, I recommend you to use a little bit of dropout, around 30% probability of dropping a neuron, after the linear1 layer.另外,我建议你在linear1层之后使用一点 dropout,大约 30% 的概率会丢弃一个神经元。

Problem solved.问题解决了。 Turns out the network will predict the same output every time if you give it the same images every time Small indexing mistake on my part during data partitioning.事实证明,如果您每次都给它相同的图像,网络每次都会预测相同的 output 在数据分区期间我的小索引错误。 Thanks for everyone's help and assistance.感谢大家的帮助和帮助。 Here is an example of the convergence as it is now:这是现在收敛的示例:

0 0.20198837077617646
1 0.17636818194389342
2 0.15786472541093827
3 0.1412761415243149
4 0.126698794901371
5 0.11397973036766053
6 0.10332610329985618
7 0.09474560652673245
8 0.08779258838295936
9 0.08199785630404949
10 0.07704121413826942
11 0.07276330365240574
12 0.06907484836131335
13 0.06584368328005076
14 0.06295975042134523
15 0.06039590438082814
16 0.058096024941653016

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM