简体繁体 English

如何用max代替tensorflow softmax在神经网络的输出层生成一个热矢量？

[英]How to replace tensorflow softmax with max for generating one hot vector at the output layer of Neural Network?

原文 2019-07-01 11:32:22 4 2 python/ tensorflow/ neural-network/ one-hot-encoding/ softmax

For a classification problem, softmax function is used in the last layer of the Neural Network. 对于分类问题，在神经网络的最后一层使用softmax函数。
I want to replace the softmax layer with the max layer that generates one hot vector with one set to the index where maximum value occurred and set all other entries to zero. 我想将softmax层替换为生成一个热矢量的max层，将其中一个设置为出现最大值的索引，并将所有其他条目设置为零。

I can do it with tf.argmax as suggested in TensorFlow - dense vector to one-hot and Tensorflow: Convert output tensor to one-hot , but these are not a differentiable way of doing it and gradients cannot be calculated. 我可以按照TensorFlow中建议的tf.argmax做到这一点-将密集向量转换为one-hot和Tensorflow：将输出张量转换为one-hot ，但是这并不是一种可区分的方法，无法计算梯度。

If not exact 0's and 1's can be obtained then values should be close enough. 如果不能获得精确的0和1，则值应足够接近。

I was thinking to apply softmax multiple times but it is not recommended and I do not understand the reason behind it. 我曾想多次应用softmax，但不建议这样做，我也不了解其背后的原因。

Please suggest a differentiable solution. 请提出一个不同的解决方案。

2 个解决方案

If I have understood correctly, I don't think what you are describing is possible. 如果我正确理解，我认为您所描述的是不可能的。 In order for an operation to be differentiable we need to be able to find a gradient. 为了使操作可微，我们需要能够找到一个梯度。

Intuitively, this doesn't make sense if you are just clipping all values to 0 or 1 直观地讲，如果您只是将所有值都剪切为0或1，这是没有意义的

UPDATE IN RESPONSE TO COMMENTS: 更新以回应评论：

You could always use such an operation it in a metric calculation. 你总是可以使用这样的操作它度量计算。 This would give you your 'more accurate view' of performance during training (but would not be used to train - just reporting results back to you). 这将使您在培训过程中对性能有“更准确的了解”（但不会用于培训-只是将结果报告给您）。

It's just not possible to use it for the loss/objective function as that's not how neural network learning works. 不可能将其用于损失/目标函数，因为这不是神经网络学习的工作原理。 I'll try to explain a little. 我会尝试解释一下。

There are proper mathematical justifications and definitions that explain why the loss function needs to be differentiable but, intuitively, we can imagine that our optimiser needs a "smooth", "continuous" surface to work on. 有适当的数学依据和定义可以解释为什么损失函数需要可微，但直觉上，我们可以想象我们的优化器需要“平滑”，“连续”的曲面才能进行处理。

Imagine walking blindfolded over a smooth, continuous plane and being tasked with finding the lowest point. 想象一下，在光滑，连续的平面上蒙着双眼走路，并被要求找到最低点。 One strategy is to tap your foot in a circle around you until you find the step you could take that gets you farthest down. 一种策略是将脚踩到周围的圆圈中，直到找到可以使自己走得最远的步骤。 Now take that step and repeat it all over again. 现在，请执行该步骤，然后再次重复。 Keep repeating until you are at the bottom with no downward steps left. 不断重复直到您到达底部，而没有向下的步伐。 One could think of Gradient descent optimisation in this way. 可以想到以这种方式进行梯度下降优化。 We take small steps in the direction that gets us lowest each time getting closer and closer to the bottom. 每次靠近底部时，我们都朝着使我们最低的方向迈出小步。

Now, instead of a smooth plane, imagine a surface that is exactly flat except for a single cliff edge. 现在，假设除了单个悬崖边缘之外，还有一个完全平坦的表面，而不是光滑的平面。 No matter where you stand on that plane you cannot possibly know which direction to step in. If you are away from the cliff edge everything is exactly flat. 无论您站在那架飞机上的哪个位置，都无法知道要朝哪个方向走。如果您远离悬崖边缘，那么一切都将完全平坦。 Even if you are on the cliff edge you still don't know which direction exactly (you probably have 180 degrees to choose from) to get to the lowest point. 即使您位于悬崖边缘，您仍然不知道要到达最低点的确切方向（您可能有180度可以选择）。

Does that make sense? 那有意义吗？ Without a smooth continuous surface we cannot use the strategy of taking small steps downwards? 如果没有光滑连续的表面，我们就不能采用向下走小步的策略吗？

不，没有可微解，这就是为什么我们使用softmax激活，因为它是对max函数的可微近似。