[英]Problem with multiple embedding layers in inner Keras model
I am trying to construct a Keras model model_B
that outputs the output of another Keras model model_A
. I am trying to construct a Keras model model_B
that outputs the output of another Keras model model_A
. Now, the output of model_A
is computed from the concatenation of several tensors coming from multiple Keras embedding layers with different vocabulary sizes.现在,model_A 的model_A
是根据来自具有不同词汇量的多个Keras 嵌入层的几个张量的串联计算的。 Models model_A
and model_B
are essentially the same.模型model_A
和model_B
本质上是相同的。
Problem: When I train model_A
, everything works fine.问题:当我训练model_A
时,一切正常。 However, when I train model_B
on the same dataset, I get the following error:但是,当我在同一数据集上训练model_B
时,出现以下错误:
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[1] = 3 is not in [0, 2) [[{{node model_1/embedding_1/embedding_lookup}}]] tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[1] = 3 is not in [0, 2) [[{{node model_1/embedding_1/embedding_lookup}}]]
Essentially, the error is saying that the index of a word is outside of the expected vocabulary, but this is not the case.本质上,错误是说单词的索引超出了预期的词汇表,但事实并非如此。 Could someone clarify why this happens?有人可以澄清为什么会这样吗?
Here is a reproducible example of the problem:这是该问题的可重现示例:
from keras.layers import Input, Dense, Lambda, Concatenate, Embedding
from keras.models import Model
import numpy as np
# Constants
A = 2
vocab_sizes = [2, 4]
# Architecture
X = Input(shape=(A,))
embeddings = []
for a in range(A):
X_a = Lambda(lambda x: x[:, a])(X)
embedding = Embedding(input_dim=vocab_sizes[a],
output_dim=1)(X_a)
embeddings.append(embedding)
h = Concatenate()(embeddings)
h = Dense(1)(h)
# Model A
model_A = Model(inputs=X, outputs=h)
model_A.compile('sgd', 'mse')
# Model B
Y = Input(shape=(A,))
model_B = Model(inputs=Y, outputs=model_A(Y))
model_B.compile('sgd', 'mse')
# Dummy dataset
x = np.array([[vocab_sizes[0] - 1, vocab_sizes[1] - 1]])
y = np.array([1])
# Train models
model_A.fit(x, y, epochs=10) # Works well
model_B.fit(x, y, epochs=10) # Fails
From the error above, it somehow seems that the input x[:, 1]
is wrongly being fed to the first embedding layer with vocabulary size 2, as opposed to the second.从上面的错误来看,输入x[:, 1]
似乎被错误地馈送到词汇量为 2 的第一个嵌入层,而不是第二个。 Interestingly, when I swap the vocabulary sizes (eg set vocab_sizes = [4, 2]
) it works, supporting the previous hypothesis.有趣的是,当我交换词汇量(例如 set vocab_sizes = [4, 2]
)时,它可以工作,支持前面的假设。
For some weird reason, looping the tensor is causing this error.由于某些奇怪的原因,循环张量会导致此错误。 You can replace your slicing with tf.split
, use the necessary adjusts and it will work well:您可以用tf.split
替换切片,使用必要的调整,它会很好地工作:
Extra imports:额外进口:
import tensorflow as tf
from keras.layers import Flatten
# Architecture
X = Input(shape=(A,))
X_as = Lambda(lambda x: tf.split(x, A, axis=1))(X)
embeddings = []
for a, x in enumerate(X_as):
embedding = Embedding(input_dim=vocab_sizes[a],
output_dim=1)(x)
embeddings.append(embedding)
h = Concatenate(axis=1)(embeddings)
h = Flatten()(h)
h = Dense(1)(h)
Why does this happen?为什么会这样?
Well, it's very hard to guess.嗯,很难猜。 My assumption is that the system is trying to apply the lambda layer using the actual variable a
instead of the value you gave before (this should not be happenning, I guess, but I had exatly this problem once when loading a model: one of the variables kept its last value when loading the model instead of having a looped value)我的假设是系统正在尝试使用实际变量a
而不是您之前给出的值来应用 lambda 层(这不应该发生,我猜,但我在加载 model 时遇到过这个问题:变量在加载 model 时保持其最后一个值,而不是循环值)
One thing that supports this explanation is trying constants instead of a
:支持这种解释的一件事是尝试使用常量而不是a
:
#Architecture
X = Input(shape=(A,))
embeddings = []
X_a1 = Lambda(lambda x: x[:, 0], name = 'lamb_'+str(0))(X)
X_a2 = Lambda(lambda x: x[:, 1], name = 'lamb_'+str(1))(X)
xs = [X_a1, X_a2]
for a, X_a in enumerate(xs):
embedding = Embedding(input_dim=vocab_sizes[a],
output_dim=1)(X_a)
embeddings.append(embedding)
h = Concatenate()(embeddings)
h = Dense(1)(h)
Solution if you want to avoid tf.split
如果你想避免tf.split
的解决方案
Another thing that works (and supports the explanation that the Lambda might be using the last value of a
in your code for model_B
) is making the entire loop inside the Lambda
layer, this way, a
doesn't get any unexpected values:另一件有效的事情(并支持 Lambda 可能在您的代码中为model_B
使用a
的最后一个值的解释)是在Lambda
层内制作整个循环,这样, a
不会得到任何意外值:
#Architecture
X = Input(shape=(A,))
X_as = Lambda(lambda x: [x[:, a] for a in range(A)])(X)
embeddings = []
for a, X_a in enumerate(X_as):
embedding = Embedding(input_dim=vocab_sizes[a],
output_dim=1)(X_a)
embeddings.append(embedding)
h = Concatenate()(embeddings)
h = Dense(1)(h)
I believe the following is happening:我相信正在发生以下情况:
(1) When you do the initial "for loop" over the Lambda function, you are initializing the constant tensors which feed into the "strided_slice" operator that extracts either the [:,0] or [:,1] correctly. (1) 当您在 Lambda function 上执行初始“for 循环”时,您正在初始化输入“strided_slice”运算符的常量张量,该运算符正确提取 [:,0] 或 [:,1]。 Using the global variable "a" in the Lambda function is probably "risky" but works okay in this instance.在 Lambda function 中使用全局变量“a”可能是“有风险的”,但在这种情况下可以正常工作。 Furthermore, I believe that the function is being stored in bytecode as "lambda x: x[:, a]" so it will try to look up whatever the value of "a" is at the time of evaluation.此外,我相信 function 以字节码形式存储为“lambda x: x[:, a]”,因此它会在评估时尝试查找“a”的值。 "a" could be anything so might be problematic under some cases. “a”可以是任何东西,因此在某些情况下可能会出现问题。
(2) When you build the first model (model_A), the constant tensors are not reinitialized, so the lambda functions (strided_slice operator) has the correct values (0 and 1) which were initialized in the "for loop." (2) 当您构建第一个 model (model_A) 时,不会重新初始化常量张量,因此 lambda 函数(strided_slice 运算符)具有在“for 循环”中初始化的正确值(0 和 1)。
(3) When you build the second model (model_B), the constant tensors are reinitialized. (3) 当你构建第二个 model (model_B) 时,常数张量被重新初始化。 However, at this time, the value of "a" is now 1 (as stated by some of the other commentary), because that is the final value after the original "for loop."但是,此时“a”的值现在是 1(正如其他一些评论所述),因为这是原始“for 循环”之后的最终值。 In fact, you can set a=0, just before defining model_B, and you'll actually get behavior which corresponds to both Lambdas extracting [:,0] and feeding it to the embedded layers.事实上,您可以在定义 model_B 之前设置 a=0,您实际上会得到与 Lambda 提取 [:,0] 并将其馈送到嵌入层相对应的行为。 My speculation for this difference in behavior is perhaps related to calling the Model_A(X) class initialization in this case (whereas in the first model, you only specified the output layer "h" and didn't call the Model_A() class as the output - this difference I believe was also suggested by other commentary). My speculation for this difference in behavior is perhaps related to calling the Model_A(X) class initialization in this case (whereas in the first model, you only specified the output layer "h" and didn't call the Model_A() class as the output - 我相信其他评论也提出了这种差异)。
I'll say that I verified this state of affairs by putting in some print statements in the file "frameworks/constant_op.py" during the operator initialization step and obtained debug statements with values and sequences consistent with what I stated above.我会说,我通过在操作符初始化步骤中将一些打印语句放入文件“frameworks/constant_op.py”中验证了事务的 state,并获得了与我上面所述一致的值和序列的调试语句。
I hope this helps.我希望这有帮助。
You shouldn't call Model_A(X)
directly.您不应该直接调用Model_A(X)
。 It will return a tensor, you can call model_B = Model(inputs=X, outputs=model_A.outputs)
.它将返回一个张量,您可以调用model_B = Model(inputs=X, outputs=model_A.outputs)
。 It works for me.这个对我有用。
The difference between Model_A(X)
and Model_A.outputs
is: Model_A(X)
和Model_A.outputs
之间的区别是:
Model_A(X) # a tensor
# <tf.Tensor 'model_13/dense_8/BiasAdd:0' shape=(?, 1) dtype=float32>
Model_A.outputs # a list of tensor
# [<tf.Tensor 'dense_8/BiasAdd:0' shape=(?, 1) dtype=float32>]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.