PyTorch 在优化三元组损失时如何计算反向传递？

Question

I am implementing a triplet network in Pytorch where the 3 instances (sub-networks) share the same weights.我正在 Pytorch 中实现一个三元组网络，其中 3 个实例（子网络）共享相同的权重。 Since the weights are shared, I implemented it as a single instance network that is called three times to produce the anchor, positive, and negative embeddings.由于权重是共享的，我将其实现为单实例网络，该网络被调用 3 次以生成锚、正和负嵌入。 The embeddings are learned by optimizing the triplet loss .通过优化三元组损失来学习嵌入。 Here is a small snippet for illustration:这是一个小片段用于说明：

from dependencies import *
model = SingleSubNet() # represents each instance in the triplet net

for epoch in epochs:
        for anch, pos, neg in enumerate(train_loader):
                optimizer.zero_grad()
                fa, fp, fn = model(anch), model(pos), model(neg)
                loss = triplet_loss(fa, fp, fn)
                loss.backward()
                optimizer.step()
                # Do more stuff ...

My complete code works as expected.我的完整代码按预期工作。 However, I do not understand what does the loss.backward() compute the gradient(s) in this case.但是，在这种情况下，我不明白loss.backward()是如何计算梯度的。 I am confused because there are 3 gradients of loss is in each learning step (the gradients formulas are here ).我很困惑，因为在每个学习步骤中都有 3 个损失梯度（梯度公式在这里）。 I assume the gradients are summed before performing optimizer.step() .我假设在执行optimizer.step()之前对梯度求和。 But then it looks from the equations that if the gradients are summed, they will cancel each other out and yield zero update term.但是从方程中可以看出，如果将梯度相加，它们将相互抵消并产生零更新项。 Of course, this is not true as the network learns meaningful embeddings at the end.当然，这不是真的，因为网络最终会学习有意义的嵌入。

Thanks in advance提前致谢

Answer 1

Late answer, but hope this helps someone.迟到的答案，但希望这对某人有所帮助。 The gradients that you linked are the gradients of the loss with respect to the embeddings (the anchor, positive embedding and negative embedding).您链接的梯度是损失相对于嵌入（锚、正嵌入和负嵌入）的梯度。 To update the model parameters, you use the gradient of the loss with respect to the model parameters.要更新 model 参数，请使用相对于 model 参数的损失梯度。 This does not sum to zero.这总和不为零。

The reason for this is that when calculating the gradient of the loss with respect to the model parameters, the formula makes use of the activations from the forward pass , and the 3 different inputs (anchor image, positive example and negative example) have different activations in the forward pass.原因是在计算关于 model 参数的损失梯度时，该公式利用了前向传递的激活，并且 3 个不同的输入（锚图像、正例和负例）具有不同的激活在前传中。

PyTorch 在优化三元组损失时如何计算反向传递？

问题描述

1 个解决方案

解决方案1
0 2022-09-02 04:56:55

PyTorch 在优化三元组损失时如何计算反向传递？

问题描述

1 个解决方案

解决方案1 0 2022-09-02 04:56:55

解决方案1
0 2022-09-02 04:56:55