简体   繁体   English

tensorflow 2 keras shuffle 每行梯度问题

[英]tensorflow 2 keras shuffle each row gradient problem

I need a NN which will be giving same output for any permutation of same input.我需要一个 NN,它将为相同输入的任何排列提供相同的输出。 Was trying to search for solution ('permutation invariance'), found some layers, but failed to make them work.试图寻找解决方案(“排列不变性”),找到了一些层,但未能使它们工作。

I chose different approach: I want to create a layer, to add as I first in the model, which will randomly shuffle input (each row independently) - please let's follow this approach, I know it can be done outside the model, but I want it as a part of the model.我选择了不同的方法:我想创建一个层,作为我在模型中的第一个添加,它将随机洗牌输入(每行独立) - 请让我们遵循这种方法,我知道它可以在模型之外完成,但我想要它作为模型的一部分。 I tried:我试过:

class ShuffleLayer(tf.keras.layers.Layer):
    def __init__(self, **kwargs):
        super(ShuffleLayer, self).__init__(**kwargs)

    def call(self, inputs):
        batchSize = tf.shape(inputs)[0]
        cols = tf.shape(inputs)[-1]
        order0 = tf.tile(tf.expand_dims(tf.range(0, batchSize), -1), [1, cols])
        order1 = tf.argsort(tf.random.uniform(shape=(batchSize, cols)))
        indices = tf.stack([tf.reshape(order0, [-1]), tf.reshape(order1, [-1])], axis=-1)
        outputs = tf.reshape(tf.gather_nd(inputs, indices), [batchSize, cols])
        return outputs

I am getting following error:我收到以下错误:

ValueError: Variable has None for gradient. ValueError:变量None渐变。 Please make sure that all of your ops have a gradient defined (ie are differentiable).请确保您的所有操作都定义了梯度(即可微分)。 Common ops without gradient: K.argmax, K.round, K.eval.没有梯度的常见操作:K.argmax、K.round、K.eval。

How to avoid it ??怎么避免?? I tried to use tf.stop_gradient , but unsuccessfully.我尝试使用tf.stop_gradient ,但没有成功。

Use Lambda layers:使用Lambda层:

First of all, if your layer doesn't have trainable weights, you should use a Lambda layer, not a custom layer.首先,如果您的层没有可训练的权重,您应该使用Lambda层,而不是自定义层。 It's way simpler and easier.它更简单,更容易。

def shuffleColumns(inputs):
    batchSize = tf.shape(inputs)[0]
    cols = tf.shape(inputs)[-1]
    order0 = tf.tile(tf.expand_dims(tf.range(0, batchSize), -1), [1, cols])
    order1 = tf.argsort(tf.random.uniform(shape=(batchSize, cols)))
    indices = tf.stack([tf.reshape(order0, [-1]), tf.reshape(order1, [-1])], axis=-1)
    outputs = tf.reshape(tf.gather_nd(inputs, indices), [batchSize, cols])
    return outputs

In the model, use a Lambda(shuffleColumns) layer.在模型中,使用Lambda(shuffleColumns)层。

About the error关于错误

If this is the first layer, this error is probably not caused by this layer.如果这是第一层,这个错误很可能不是由这一层引起的。 (Unless newer versions of Tensorflow are demanding that custom layers have weights and def build(self, input_shape): defined, which doesn't seem very logical). (除非 Tensorflow 的新版本要求自定义层具有权重和def build(self, input_shape):定义,这似乎不太合乎逻辑)。

It seems you are doing something else in another place.看来您正在另一个地方做其他事情。 The error is: you are using some operation that blocks backpropagation because it's impossible to have the derivative of that operation.错误是:您正在使用一些阻止反向传播的操​​作,因为不可能有该操作的导数。

Since the derivatives are taken with respect to the model's "weights", this means that the operation is necessarily after the first weight tensor in the model (ie: after the first layer that contains trainable weights).由于对模型的“权重”进行导数,这意味着操作必须在模型中的第一个权重张量之后(即:在包含可训练权重的第一层之后)。

You need to search for anything in your model that doesn't have derivatives, like the error suggests: round, argmax, conditionals that return constants, losses that return sorted y_true but don't return operations on y_pred , etc.您需要在模型中搜索没有导数的任何内容,如错误提示:round、argmax、返回常量的条件、返回排序的y_true但不返回对y_pred操作的y_pred等。

Of course that K.stop_gradients is also an operation that blocks backpropagation and will certainly cause this error if you just use it like that.当然, K.stop_gradients也是一个阻塞反向传播的操​​作,如果你这样使用它肯定会导致这个错误。 (This may even be the "cause" of your problem, not the solution) (这甚至可能是您问题的“原因”,而不是解决方案)

Below there are easier suggestions for your operation, but none of them will fix this error because this error is somewhere else.下面为您的操作提供了更简单的建议,但它们都不会修复此错误,因为此错误在其他地方。

Suggested operation 1建议操作 1

Now, it would be way easier to use tf.random.shuffle for this:现在,为此使用tf.random.shuffle会更容易:

def shuffleColumns(x):
    x = tf.transpose(x)
    x = tf.random.shuffle(x)
    return tf.transpose(x)

Use a Lambda(shuffleColumns) layer in your model.在模型中使用Lambda(shuffleColumns)层。 It's true that this will shuffle all columns equally, but every batch will have a different permutation.确实,这将平等地洗牌所有列,但每个批次都会有不同的排列。 And since you're going to have many epochs, and you will be shuffling (I presume) samples between each epoch (this is automatic in fit ), you will hardly ever have repeated batches.并且由于您将有许多 epoch,并且您将在每个 epoch 之间混洗(我假设)样本(这在fit是自动的),因此您几乎不会有重复的批次。 So:所以:

  • each batch will have a different permutation每批将有不同的排列
  • it will be almost impossible to have the same batch two times几乎不可能两次使用同一批次

This approach will probably be way faster than yours.这种方法可能比你的方法快得多。

Suggested operation 2建议操作 2

If you want them permutation invariant, why not use tf.sort instead of permutations?如果您希望它们排列不变,为什么不使用tf.sort而不是排列? Sort the columns and, instead of having infinite permutations to train, you simply eliminate any possibility of permutation.对列进行排序,而不是有无限的排列来训练,您只需消除排列的任何可能性。 The model should learn faster, and yet the order of the columns in your input will not be taken into account.模型应该学习得更快,但不会考虑输入中列的顺序。

Use the layer Lambda(lambda x: tf.sort(x, axis=-1))使用层Lambda(lambda x: tf.sort(x, axis=-1))

This suggestion must be used both in training and inference.这个建议必须同时用于训练和推理。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM