为什么这个 tf.keras model 在切片输入上的行为与预期不同？

Question

I'm coding a Keras model which, given (mini)-batches of tensors, applies the same layer to each of their elements.我正在编写一个 Keras model ，给定（小）批张量，将相同的层应用于它们的每个元素。 Just to give a little bit of context, I'm giving as input groups (of fixed size) of strings, which must be encoded one by one by an encoding layer.只是为了提供一点上下文，我提供字符串的输入组（固定大小），这些字符串必须由编码层一个一个地编码。 Thus, the input size comprising the (mini)-batch size is (None, n_sentences_per_sample, ), where n_sentences_per_sample is a fixed value known a prior.因此，包含 (mini)-batch 大小的输入大小是 (None, n_sentences_per_sample, )，其中 n_sentences_per_sample 是先验已知的固定值。

To do so, I use this custom function when creating the model in the Functional API:为此，我在功能 API 中创建 model 时使用此自定义 function：

def _branch_execute(layer_in: keras.layers.Layer, sublayer: [keras.layers.Layer, Callable], **args) -> keras.layers.Layer:
    instance_cnt = layer_in.shape[1]
    sliced_inputs = [tf.keras.layers.Lambda(lambda x: x[:, i])(layer_in) for i in range(instance_cnt)]
    branch_layers = [sublayer(**{**{'layer_in': sliced_inputs[i]}, **args}) for i in range(instance_cnt)]
    expand_layer = tf.keras.layers.Lambda(lambda x: K.expand_dims(x, axis=1))            
    expanded_layers = [expand_layer(branch_layers[i]) for i in range(instance_cnt)]
    concated_layer = tf.keras.layers.Concatenate(axis=1)(expanded_layers)    
    return concated_layer

which I use in this way我以这种方式使用

model_input = keras.layers.Input(shape=(self.max_sents, ),
                                 dtype=tf.string,
                                 )
sentences_embedded = self._branch_execute(model_input, self._get_nnlm_128)
model = keras.models.Model(model_input, sentences_embedded)

where self._get_nnlm_128() is just a function which returns the result of applying a cached pretrained embedding layer to the input, ie其中self._get_nnlm_128()只是一个 function ，它返回将缓存的预训练嵌入层应用于输入的结果，即

def _get_nnlm_128(self, layer_in: keras.layers.Layer, trainable: bool = False):
    if 'nnlm_128_layer_shared' not in self.shared_layers_cache:
        self.shared_layers_cache['nnlm_128_layer_shared'] = {
            'encoder': hub.KerasLayer("https://tfhub.dev/google/nnlm-en-dim128-with-normalization/2", trainable=trainable)
        }
    shared_layers = self.shared_layers_cache['nnlm_128_layer_shared']
    encoder = shared_layers['encoder'](layer_in)
    return encoder

The problem I have is as follows:我遇到的问题如下：

If I call self._branch_execute(input_tensor, self._get_nnlm_128) where input tensor is just a well-shaped tensor, it works perfectly;如果我调用self._branch_execute(input_tensor, self._get_nnlm_128)输入张量只是一个形状良好的张量，它可以完美地工作；
If I call model (whether directly or through.predict(), whether compiled or not) on the same input_tensor , I get a repeated result for every sentence in the sample (weirdly, it is the output corresponding to the LAST sentence, repeated - see below);如果我在同一个input_tensor上调用model （无论是直接还是通过.predict()，无论是否编译），我都会得到样本中每个句子的重复结果（奇怪的是，它是对应于最后一个句子的 output，重复 -见下文）;

Just as an example (tho I have the same issue with every possible input), let us consider an input_tensor composed of 7 sentences (7 strings), reshaped as (1, 7, ) to include the minibatch axis.举个例子（尽管我对每个可能的输入都有同样的问题），让我们考虑一个由 7 个句子（7 个字符串）组成的input_tensor ，重新整形为 (1, 7, ) 以包含 minibatch 轴。 The result of 1) is 1) 的结果是

[[[ 0.216900051 0.037066862 0.163929373 ... 0.050420273 0.082906663 0.059960182],
  [ 0.531883411 -0.000807280 0.107559107 ... -0.079948671 -0.020143294 0.007032406],
  ...
  [ 0.15044811 0.00890037  0.10413752 ... -0.05391502 -1.2199926 -0.13466084]]]

where I get 7 vectors/embeddings of size 128, all different from each other as expected;我得到 7 个大小为 128 的向量/嵌入，如预期的那样彼此不同； The result of 2) is, oddly enough, 2) 的结果是，奇怪的是，

[[[ 0.15044811  0.00890037  0.10413752 ... -0.05391502 -0.12199926 -0.13466084],  
  [ 0.15044811  0.00890037  0.10413752 ... -0.05391502 -0.12199926 -0.13466084], 
  [ 0.15044811  0.00890037  0.10413752 ... -0.05391502 -0.12199926 -0.13466084],
  [ 0.15044811  0.00890037  0.10413752 ... -0.05391502 -0.12199926 -0.13466084],
  [ 0.15044811  0.00890037  0.10413752 ... -0.05391502 -0.12199926 -0.13466084],
  [ 0.15044811  0.00890037  0.10413752 ... -0.05391502 -0.12199926 -0.13466084],
  [ 0.15044811  0.00890037  0.10413752 ... -0.05391502 -0.12199926 -0.13466084]]]

where I get 7 times the same vector (as I said, it always corresponds to the last one repeated for all the sentences).我得到了 7 次相同的向量（正如我所说，它总是对应于所有句子重复的最后一个）。 I took the results from an actual run.我从实际运行中获取了结果。

Among the many trials I made, I tried to output model_input from the model, which works great, ie it corresponds to the input strings.在我进行的许多试验中，我尝试了model中的 output model_input，效果很好，即它对应于输入字符串。 The embedding model is taken directly from Tensorflow hub, so it should not have problems.嵌入model直接取自Tensorflow集线器，所以应该没有问题。 Additionally, the same behavior is observed with any other embedding layer, whether custom or pretrained.此外，对于任何其他嵌入层，无论是自定义的还是预训练的，都会观察到相同的行为。 I think therefore that the problem may be in the _branch_execute() function, but I have no idea of what the issue could be given that used alone it works correctly.因此，我认为问题可能出在_branch_execute() function 中，但我不知道如果单独使用它可以正常工作，可能会出现什么问题。 Maybe it can have to do with some peculiar broadcasting behavior inside of keras models, but I don't know how to test it, let alone how to solve it.也许它可能与 keras 模型内部的一些特殊广播行为有关，但我不知道如何测试它，更不用说如何解决它了。

I would appreciate any suggestion that you may have about why this issue is there and how to fix it.如果您有任何关于为什么会出现此问题以及如何解决此问题的建议，我将不胜感激。 I'm not an expert of Tensorflow, so maybe I'm just misjudging something (in case, forgive me.).我不是Tensorflow的专家，所以也许我只是误判了一些东西（以防万一，请原谅我。）。 I'll be glad to share more info as needed to help solve the problem: Thanks a lot :)我很乐意根据需要分享更多信息以帮助解决问题：非常感谢:)

Answer 1

I finally came to the conclusion that the problem was into the line我终于得出结论，问题出在线路上

sliced_inputs = [tf.keras.layers.Lambda(lambda x: x[:, i])(layer_in) for i in range(instance_cnt)]

which apparently does not work as expected (I'm running Tensorflow 2.4.0, but I got the same issue also with Tensorflow 2.5.0-nightly).这显然没有按预期工作（我正在运行 Tensorflow 2.4.0，但我也遇到了与 Tensorflow 2.5.0-nightly 相同的问题）。 I just substituted the Lambda layer with a custom layer that does exactly the same thing, ie我刚刚用一个完全一样的自定义层替换了 Lambda 层，即

class Slicer(keras.layers.Layer):
    def __init__(self, i, **kwargs):
        self.i = i
        super(Slicer, self).__init__(**kwargs)

    def call(self, inputs, **kwargs):
        return inputs[:, self.i]

which I then used in the _branch_execute() function just like this然后我在_branch_execute() function 中使用它，就像这样

def _branch_execute(self, layer_in: keras.layers.Layer, sublayer: [keras.layers.Layer, Callable], **args) -> keras.layers.Layer:
    instance_cnt = layer_in.shape[1]
    sliced_inputs = [Slicer(i)(layer_in) for i in range(instance_cnt)]
    branch_layers = [sublayer(**{**{'layer_in': sliced_inputs[i]}, **args}) for i in range(instance_cnt)]
    expand_layer = tf.keras.layers.Lambda(lambda x: K.expand_dims(x, axis=1))
    expanded_layers = [expand_layer(branch_layers[i]) for i in range(instance_cnt)]
    concated_layer = tf.keras.layers.Concatenate(axis=1)(expanded_layers)
    return concated_layer

I'm not sure if this is the best option to solve the problem, but it seems pretty neat and it works well.我不确定这是否是解决问题的最佳选择，但它看起来很整洁并且效果很好。

Since this is probably an unexpected behavior of the Lambda layer, I'll be opening an issue on the Tensorflow github and post here the reply for reference.由于这可能是 Lambda 层的意外行为，我将在 Tensorflow github 上打开一个问题并在此处发布回复以供参考。

为什么这个 tf.keras model 在切片输入上的行为与预期不同？

问题描述

1 个解决方案

解决方案1
0 已采纳 2021-02-13 21:43:02

为什么这个 tf.keras model 在切片输入上的行为与预期不同？

问题描述

1 个解决方案

解决方案1 0 已采纳 2021-02-13 21:43:02

解决方案1
0 已采纳 2021-02-13 21:43:02