内部 Keras model 中的多个嵌入层的问题

Question

I am trying to construct a Keras model model_B that outputs the output of another Keras model model_A . I am trying to construct a Keras model model_B that outputs the output of another Keras model model_A . Now, the output of model_A is computed from the concatenation of several tensors coming from multiple Keras embedding layers with different vocabulary sizes.现在，model_A 的model_A是根据来自具有不同词汇量的多个Keras 嵌入层的几个张量的串联计算的。 Models model_A and model_B are essentially the same.模型model_A和model_B本质上是相同的。

Problem: When I train model_A , everything works fine.问题：当我训练model_A时，一切正常。 However, when I train model_B on the same dataset, I get the following error:但是，当我在同一数据集上训练model_B时，出现以下错误：

tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[1] = 3 is not in [0, 2) [[{{node model_1/embedding_1/embedding_lookup}}]] tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[1] = 3 is not in [0, 2) [[{{node model_1/embedding_1/embedding_lookup}}]]

Essentially, the error is saying that the index of a word is outside of the expected vocabulary, but this is not the case.本质上，错误是说单词的索引超出了预期的词汇表，但事实并非如此。 Could someone clarify why this happens?有人可以澄清为什么会这样吗？

Here is a reproducible example of the problem:这是该问题的可重现示例：

from keras.layers import Input, Dense, Lambda, Concatenate, Embedding
from keras.models import Model
import numpy as np


# Constants
A = 2
vocab_sizes = [2, 4]

# Architecture
X = Input(shape=(A,))
embeddings = []
for a in range(A):
    X_a = Lambda(lambda x: x[:, a])(X)
    embedding = Embedding(input_dim=vocab_sizes[a],
                          output_dim=1)(X_a)
    embeddings.append(embedding)
h = Concatenate()(embeddings)
h = Dense(1)(h)

# Model A
model_A = Model(inputs=X, outputs=h)
model_A.compile('sgd', 'mse')

# Model B
Y = Input(shape=(A,))
model_B = Model(inputs=Y, outputs=model_A(Y))
model_B.compile('sgd', 'mse')

# Dummy dataset
x = np.array([[vocab_sizes[0] - 1, vocab_sizes[1] - 1]])
y = np.array([1])

# Train models
model_A.fit(x, y, epochs=10)  # Works well
model_B.fit(x, y, epochs=10)  # Fails

From the error above, it somehow seems that the input x[:, 1] is wrongly being fed to the first embedding layer with vocabulary size 2, as opposed to the second.从上面的错误来看，输入x[:, 1]似乎被错误地馈送到词汇量为 2 的第一个嵌入层，而不是第二个。 Interestingly, when I swap the vocabulary sizes (eg set vocab_sizes = [4, 2] ) it works, supporting the previous hypothesis.有趣的是，当我交换词汇量（例如 set vocab_sizes = [4, 2] ）时，它可以工作，支持前面的假设。

Answer 1

For some weird reason, looping the tensor is causing this error.由于某些奇怪的原因，循环张量会导致此错误。 You can replace your slicing with tf.split , use the necessary adjusts and it will work well:您可以用tf.split替换切片，使用必要的调整，它会很好地工作：

Extra imports:额外进口：

import tensorflow as tf
from keras.layers import Flatten

# Architecture
X = Input(shape=(A,))
X_as = Lambda(lambda x: tf.split(x, A, axis=1))(X)

embeddings = []
for a, x in enumerate(X_as):
    embedding = Embedding(input_dim=vocab_sizes[a],
                          output_dim=1)(x)
    embeddings.append(embedding)
h = Concatenate(axis=1)(embeddings)
h = Flatten()(h)
h = Dense(1)(h)

Why does this happen?为什么会这样？

Well, it's very hard to guess.嗯，很难猜。 My assumption is that the system is trying to apply the lambda layer using the actual variable a instead of the value you gave before (this should not be happenning, I guess, but I had exatly this problem once when loading a model: one of the variables kept its last value when loading the model instead of having a looped value)我的假设是系统正在尝试使用实际变量a而不是您之前给出的值来应用 lambda 层（这不应该发生，我猜，但我在加载 model 时遇到过这个问题：变量在加载 model 时保持其最后一个值，而不是循环值）

One thing that supports this explanation is trying constants instead of a :支持这种解释的一件事是尝试使用常量而不是a ：

#Architecture
X = Input(shape=(A,))
embeddings = []

X_a1 = Lambda(lambda x: x[:, 0], name = 'lamb_'+str(0))(X)
X_a2 = Lambda(lambda x: x[:, 1], name = 'lamb_'+str(1))(X)
xs = [X_a1, X_a2]

for a, X_a in enumerate(xs):
    embedding = Embedding(input_dim=vocab_sizes[a],
                          output_dim=1)(X_a)
    embeddings.append(embedding)
h = Concatenate()(embeddings)
h = Dense(1)(h)

Solution if you want to avoid tf.split如果你想避免tf.split的解决方案

Another thing that works (and supports the explanation that the Lambda might be using the last value of a in your code for model_B ) is making the entire loop inside the Lambda layer, this way, a doesn't get any unexpected values:另一件有效的事情（并支持 Lambda 可能在您的代码中为model_B使用a的最后一个值的解释）是在Lambda层内制作整个循环，这样， a不会得到任何意外值：

#Architecture
X = Input(shape=(A,))
X_as = Lambda(lambda x: [x[:, a] for a in range(A)])(X)

embeddings = []
for a, X_a in enumerate(X_as):
    embedding = Embedding(input_dim=vocab_sizes[a],
                          output_dim=1)(X_a)
    embeddings.append(embedding)
h = Concatenate()(embeddings)
h = Dense(1)(h)

Answer 2

I believe the following is happening:我相信正在发生以下情况：

(1) When you do the initial "for loop" over the Lambda function, you are initializing the constant tensors which feed into the "strided_slice" operator that extracts either the [:,0] or [:,1] correctly. (1) 当您在 Lambda function 上执行初始“for 循环”时，您正在初始化输入“strided_slice”运算符的常量张量，该运算符正确提取 [:,0] 或 [:,1]。 Using the global variable "a" in the Lambda function is probably "risky" but works okay in this instance.在 Lambda function 中使用全局变量“a”可能是“有风险的”，但在这种情况下可以正常工作。 Furthermore, I believe that the function is being stored in bytecode as "lambda x: x[:, a]" so it will try to look up whatever the value of "a" is at the time of evaluation.此外，我相信 function 以字节码形式存储为“lambda x: x[:, a]”，因此它会在评估时尝试查找“a”的值。 "a" could be anything so might be problematic under some cases. “a”可以是任何东西，因此在某些情况下可能会出现问题。

(2) When you build the first model (model_A), the constant tensors are not reinitialized, so the lambda functions (strided_slice operator) has the correct values (0 and 1) which were initialized in the "for loop." (2) 当您构建第一个 model (model_A) 时，不会重新初始化常量张量，因此 lambda 函数（strided_slice 运算符）具有在“for 循环”中初始化的正确值（0 和 1）。

(3) When you build the second model (model_B), the constant tensors are reinitialized. (3) 当你构建第二个 model (model_B) 时，常数张量被重新初始化。 However, at this time, the value of "a" is now 1 (as stated by some of the other commentary), because that is the final value after the original "for loop."但是，此时“a”的值现在是 1（正如其他一些评论所述），因为这是原始“for 循环”之后的最终值。 In fact, you can set a=0, just before defining model_B, and you'll actually get behavior which corresponds to both Lambdas extracting [:,0] and feeding it to the embedded layers.事实上，您可以在定义 model_B 之前设置 a=0，您实际上会得到与 Lambda 提取 [:,0] 并将其馈送到嵌入层相对应的行为。 My speculation for this difference in behavior is perhaps related to calling the Model_A(X) class initialization in this case (whereas in the first model, you only specified the output layer "h" and didn't call the Model_A() class as the output - this difference I believe was also suggested by other commentary). My speculation for this difference in behavior is perhaps related to calling the Model_A(X) class initialization in this case (whereas in the first model, you only specified the output layer "h" and didn't call the Model_A() class as the output - 我相信其他评论也提出了这种差异）。

I'll say that I verified this state of affairs by putting in some print statements in the file "frameworks/constant_op.py" during the operator initialization step and obtained debug statements with values and sequences consistent with what I stated above.我会说，我通过在操作符初始化步骤中将一些打印语句放入文件“frameworks/constant_op.py”中验证了事务的 state，并获得了与我上面所述一致的值和序列的调试语句。

I hope this helps.我希望这有帮助。

Answer 3

You shouldn't call Model_A(X) directly.您不应该直接调用Model_A(X) 。 It will return a tensor, you can call model_B = Model(inputs=X, outputs=model_A.outputs) .它将返回一个张量，您可以调用model_B = Model(inputs=X, outputs=model_A.outputs) 。 It works for me.这个对我有用。

The difference between Model_A(X) and Model_A.outputs is: Model_A(X)和Model_A.outputs之间的区别是：

Model_A(X) # a tensor
# <tf.Tensor 'model_13/dense_8/BiasAdd:0' shape=(?, 1) dtype=float32>

Model_A.outputs # a list of tensor
# [<tf.Tensor 'dense_8/BiasAdd:0' shape=(?, 1) dtype=float32>]

内部 Keras model 中的多个嵌入层的问题

问题描述

2 个解决方案

解决方案1
1 已采纳 2019-11-07 19:10:33

解决方案2
1 2019-11-09 22:38:53

解决方案3
0 2019-11-05 19:16:40

内部 Keras model 中的多个嵌入层的问题

问题描述

2 个解决方案

解决方案1 1 已采纳 2019-11-07 19:10:33

解决方案2 1 2019-11-09 22:38:53

解决方案3 0 2019-11-05 19:16:40

解决方案1
1 已采纳 2019-11-07 19:10:33

解决方案2
1 2019-11-09 22:38:53

解决方案3
0 2019-11-05 19:16:40