简体   繁体   English

内部 Keras model 中的多个嵌入层的问题

[英]Problem with multiple embedding layers in inner Keras model

I am trying to construct a Keras model model_B that outputs the output of another Keras model model_A . I am trying to construct a Keras model model_B that outputs the output of another Keras model model_A . Now, the output of model_A is computed from the concatenation of several tensors coming from multiple Keras embedding layers with different vocabulary sizes.现在,model_A 的model_A是根据来自具有不同词汇量的多个Keras 嵌入层的几个张量的串联计算的。 Models model_A and model_B are essentially the same.模型model_Amodel_B本质上是相同的。

Problem: When I train model_A , everything works fine.问题:当我训练model_A时,一切正常。 However, when I train model_B on the same dataset, I get the following error:但是,当我在同一数据集上训练model_B时,出现以下错误:

tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[1] = 3 is not in [0, 2) [[{{node model_1/embedding_1/embedding_lookup}}]] tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[1] = 3 is not in [0, 2) [[{{node model_1/embedding_1/embedding_lookup}}]]

Essentially, the error is saying that the index of a word is outside of the expected vocabulary, but this is not the case.本质上,错误是说单词的索引超出了预期的词汇表,但事实并非如此。 Could someone clarify why this happens?有人可以澄清为什么会这样吗?


Here is a reproducible example of the problem:这是该问题的可重现示例:

from keras.layers import Input, Dense, Lambda, Concatenate, Embedding
from keras.models import Model
import numpy as np


# Constants
A = 2
vocab_sizes = [2, 4]

# Architecture
X = Input(shape=(A,))
embeddings = []
for a in range(A):
    X_a = Lambda(lambda x: x[:, a])(X)
    embedding = Embedding(input_dim=vocab_sizes[a],
                          output_dim=1)(X_a)
    embeddings.append(embedding)
h = Concatenate()(embeddings)
h = Dense(1)(h)

# Model A
model_A = Model(inputs=X, outputs=h)
model_A.compile('sgd', 'mse')

# Model B
Y = Input(shape=(A,))
model_B = Model(inputs=Y, outputs=model_A(Y))
model_B.compile('sgd', 'mse')

# Dummy dataset
x = np.array([[vocab_sizes[0] - 1, vocab_sizes[1] - 1]])
y = np.array([1])

# Train models
model_A.fit(x, y, epochs=10)  # Works well
model_B.fit(x, y, epochs=10)  # Fails

From the error above, it somehow seems that the input x[:, 1] is wrongly being fed to the first embedding layer with vocabulary size 2, as opposed to the second.从上面的错误来看,输入x[:, 1]似乎被错误地馈送到词汇量为 2 的第一个嵌入层,而不是第二个。 Interestingly, when I swap the vocabulary sizes (eg set vocab_sizes = [4, 2] ) it works, supporting the previous hypothesis.有趣的是,当我交换词汇量(例如 set vocab_sizes = [4, 2] )时,它可以工作,支持前面的假设。

For some weird reason, looping the tensor is causing this error.由于某些奇怪的原因,循环张量会导致此错误。 You can replace your slicing with tf.split , use the necessary adjusts and it will work well:您可以用tf.split替换切片,使用必要的调整,它会很好地工作:

Extra imports:额外进口:

import tensorflow as tf
from keras.layers import Flatten
# Architecture
X = Input(shape=(A,))
X_as = Lambda(lambda x: tf.split(x, A, axis=1))(X)

embeddings = []
for a, x in enumerate(X_as):
    embedding = Embedding(input_dim=vocab_sizes[a],
                          output_dim=1)(x)
    embeddings.append(embedding)
h = Concatenate(axis=1)(embeddings)
h = Flatten()(h)
h = Dense(1)(h)

Why does this happen?为什么会这样?

Well, it's very hard to guess.嗯,很难猜。 My assumption is that the system is trying to apply the lambda layer using the actual variable a instead of the value you gave before (this should not be happenning, I guess, but I had exatly this problem once when loading a model: one of the variables kept its last value when loading the model instead of having a looped value)我的假设是系统正在尝试使用实际变量a而不是您之前给出的值来应用 lambda 层(这不应该发生,我猜,但我在加载 model 时遇到过这个问题:变量在加载 model 时保持其最后一个值,而不是循环值)

One thing that supports this explanation is trying constants instead of a :支持这种解释的一件事是尝试使用常量而不是a

#Architecture
X = Input(shape=(A,))
embeddings = []

X_a1 = Lambda(lambda x: x[:, 0], name = 'lamb_'+str(0))(X)
X_a2 = Lambda(lambda x: x[:, 1], name = 'lamb_'+str(1))(X)
xs = [X_a1, X_a2]

for a, X_a in enumerate(xs):
    embedding = Embedding(input_dim=vocab_sizes[a],
                          output_dim=1)(X_a)
    embeddings.append(embedding)
h = Concatenate()(embeddings)
h = Dense(1)(h)

Solution if you want to avoid tf.split如果你想避免tf.split的解决方案

Another thing that works (and supports the explanation that the Lambda might be using the last value of a in your code for model_B ) is making the entire loop inside the Lambda layer, this way, a doesn't get any unexpected values:另一件有效的事情(并支持 Lambda 可能在您的代码中为model_B使用a的最后一个值的解释)是在Lambda层内制作整个循环,这样, a不会得到任何意外值:

#Architecture
X = Input(shape=(A,))
X_as = Lambda(lambda x: [x[:, a] for a in range(A)])(X)

embeddings = []
for a, X_a in enumerate(X_as):
    embedding = Embedding(input_dim=vocab_sizes[a],
                          output_dim=1)(X_a)
    embeddings.append(embedding)
h = Concatenate()(embeddings)
h = Dense(1)(h)

I believe the following is happening:我相信正在发生以下情况:

(1) When you do the initial "for loop" over the Lambda function, you are initializing the constant tensors which feed into the "strided_slice" operator that extracts either the [:,0] or [:,1] correctly. (1) 当您在 Lambda function 上执行初始“for 循环”时,您正在初始化输入“strided_slice”运算符的常量张量,该运算符正确提取 [:,0] 或 [:,1]。 Using the global variable "a" in the Lambda function is probably "risky" but works okay in this instance.在 Lambda function 中使用全局变量“a”可能是“有风险的”,但在这种情况下可以正常工作。 Furthermore, I believe that the function is being stored in bytecode as "lambda x: x[:, a]" so it will try to look up whatever the value of "a" is at the time of evaluation.此外,我相信 function 以字节码形式存储为“lambda x: x[:, a]”,因此它会在评估时尝试查找“a”的值。 "a" could be anything so might be problematic under some cases. “a”可以是任何东西,因此在某些情况下可能会出现问题。

(2) When you build the first model (model_A), the constant tensors are not reinitialized, so the lambda functions (strided_slice operator) has the correct values (0 and 1) which were initialized in the "for loop." (2) 当您构建第一个 model (model_A) 时,不会重新初始化常量张量,因此 lambda 函数(strided_slice 运算符)具有在“for 循环”中初始化的正确值(0 和 1)。

(3) When you build the second model (model_B), the constant tensors are reinitialized. (3) 当你构建第二个 model (model_B) 时,常数张量重新初始化。 However, at this time, the value of "a" is now 1 (as stated by some of the other commentary), because that is the final value after the original "for loop."但是,此时“a”的值现在是 1(正如其他一些评论所述),因为这是原始“for 循环”之后的最终值。 In fact, you can set a=0, just before defining model_B, and you'll actually get behavior which corresponds to both Lambdas extracting [:,0] and feeding it to the embedded layers.事实上,您可以在定义 model_B 之前设置 a=0,您实际上会得到与 Lambda 提取 [:,0] 并将其馈送到嵌入层相对应的行为。 My speculation for this difference in behavior is perhaps related to calling the Model_A(X) class initialization in this case (whereas in the first model, you only specified the output layer "h" and didn't call the Model_A() class as the output - this difference I believe was also suggested by other commentary). My speculation for this difference in behavior is perhaps related to calling the Model_A(X) class initialization in this case (whereas in the first model, you only specified the output layer "h" and didn't call the Model_A() class as the output - 我相信其他评论也提出了这种差异)。

I'll say that I verified this state of affairs by putting in some print statements in the file "frameworks/constant_op.py" during the operator initialization step and obtained debug statements with values and sequences consistent with what I stated above.我会说,我通过在操作符初始化步骤中将一些打印语句放入文件“frameworks/constant_op.py”中验证了事务的 state,并获得了与我上面所述一致的值和序列的调试语句。

I hope this helps.我希望这有帮助。

You shouldn't call Model_A(X) directly.您不应该直接调用Model_A(X) It will return a tensor, you can call model_B = Model(inputs=X, outputs=model_A.outputs) .它将返回一个张量,您可以调用model_B = Model(inputs=X, outputs=model_A.outputs) It works for me.这个对我有用。

The difference between Model_A(X) and Model_A.outputs is: Model_A(X)Model_A.outputs之间的区别是:

Model_A(X) # a tensor
# <tf.Tensor 'model_13/dense_8/BiasAdd:0' shape=(?, 1) dtype=float32>

Model_A.outputs # a list of tensor
# [<tf.Tensor 'dense_8/BiasAdd:0' shape=(?, 1) dtype=float32>]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM