使用汉明距离损失 Function 和 Tensorflow GradientTape：无梯度。是不可微分的吗？

Question

I'm using Tensorflow 2.1 and Python 3, creating my custom training model following the tutorial " Tensorflow - Custom training: walkthrough ". I'm using Tensorflow 2.1 and Python 3, creating my custom training model following the tutorial " Tensorflow - Custom training: walkthrough ".

I'm trying to use Hamming Distance on my loss function:我正在尝试对我的损失使用汉明距离 function：

import tensorflow as tf
import tensorflow_addons as tfa

def my_loss_hamming(model, x, y):
  global output
  output = model(x)

  return tfa.metrics.hamming.hamming_loss_fn(y, output, threshold=0.5, mode='multilabel')


def grad(model, inputs, targets):
  with tf.GradientTape() as tape:
      tape.watch(model.trainable_variables)
      loss_value = my_loss_hamming(model, inputs, targets)

  return loss_value, tape.gradient(loss_value, model.trainable_variables)

When I call it:当我调用它时：

loss_value, grads = grad(model, feature, label)
optimizer.apply_gradients(zip(grads, model.trainable_variables))

grads variable is a list with 38 None. grads变量是一个包含 38 个无的列表。

And I get the error:我得到了错误：

No gradients provided for any variable: ['conv1_1/kernel:0', ...]

Is there any way to use Hamming Distance without "interrupts the gradient chain registered by the gradient tape"?有没有什么方法可以使用汉明距离而不“中断渐变胶带注册的渐变链”？

Answer 1

Apology if I'm saying something obvious, but the way how backpropagation works as a fitting algorithm for neural networks is through gradients - eg for each batch of training data you compute how much the loss function will improve/degrade if you move a particular trainable weight by a very small amount delta .抱歉，如果我说的很明显，但是反向传播作为神经网络的拟合算法的工作方式是通过梯度 - 例如，对于每批训练数据，您计算 function 损失的多少，如果您移动特定的可训练数据重量由一个非常小的量delta 。

Hamming loss is by definition not differentiable, so for small movements of trainable weights you will never experience any changes in the loss.根据定义，汉明损失是不可微的，因此对于可训练权重的小幅移动，您将永远不会体验到损失的任何变化。 I imagine it is only added to be used for final measurements of trained models' performance rather than for training.我想它只是被添加用于训练模型性能的最终测量，而不是用于训练。

If you want to train a neural net through backpropagation you need to use some differentiable loss - such that can help the model to move weights in the right direction.如果你想通过反向传播训练神经网络，你需要使用一些可微的损失——这样可以帮助 model 将权重向正确的方向移动。 Sometimes people use different techniques to smooth such losses as Hamming less and create approximations - eg here it could be something which would penalize less predictions which are closer to the target answer rather then just giving out 1 for everything above threshold and 0 for everything else.有时人们使用不同的技术来平滑这种损失，例如更少的汉明并创建近似值 - 例如，这里可能会惩罚更接近目标答案的更少预测，而不是对高于阈值的所有内容给出 1，对其他所有内容给出 0。

使用汉明距离损失 Function 和 Tensorflow GradientTape：无梯度。是不可微分的吗？

问题描述

1 个解决方案

解决方案1
2 2020-06-11 05:39:12

使用汉明距离损失 Function 和 Tensorflow GradientTape：无梯度。 是不可微分的吗？

问题描述

1 个解决方案

解决方案1 2 2020-06-11 05:39:12

使用汉明距离损失 Function 和 Tensorflow GradientTape：无梯度。是不可微分的吗？

解决方案1
2 2020-06-11 05:39:12