警告：警告：tensorflow:Model 是用形状（无，150）构造的，但它是在形状不兼容的输入上调用的（无，1）

Question

So I'm trying to build a word embedding model but I keep getting this error.所以我正在尝试构建一个嵌入 model 的单词，但我不断收到此错误。 During training, the accuracy does not change and the val_loss remains "nan"在训练期间，准确率没有变化，val_loss 保持“nan”

The raw shape of the data is数据的原始形状是

x.shape, y.shape
((94556,), (94556, 2557))

Then I reshape it so:然后我重塑它：

xr= np.asarray(x).astype('float32').reshape((-1,1))
yr= np.asarray(y).astype('float32').reshape((-1,1))
((94556, 1), (241779692, 1))

Then I run it through my model然后我通过我的 model 运行它

model = Sequential()
model.add(Embedding(2557, 64, input_length=150, embeddings_initializer='glorot_uniform'))
model.add(Flatten())
model.add(Reshape((64,), input_shape=(94556, 1)))
model.add(Dense(512, activation='sigmoid'))
model.add(Dense(128, activation='sigmoid'))
model.add(Dense(64, activation='relu'))
model.add(Dense(10, activation='sigmoid'))
model.add(Dense(1, activation='relu'))
# compile the mode
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
# summarize the model
print(model.summary())
plot_model(model, show_shapes = True, show_layer_names=False)

After training, I get a constant accuracy and a val_loss nan for every epoch训练后，我得到一个恒定的准确率和每个时期的 val_loss nan

history=model.fit(xr, yr, epochs=20, batch_size=32, validation_split=3/9)

Epoch 1/20
WARNING:tensorflow:Model was constructed with shape (None, 150) for input Tensor("embedding_6_input:0", shape=(None, 150), dtype=float32), but it was called on an input with incompatible shape (None, 1).
WARNING:tensorflow:Model was constructed with shape (None, 150) for input Tensor("embedding_6_input:0", shape=(None, 150), dtype=float32), but it was called on an input with incompatible shape (None, 1).
1960/1970 [============================>.] - ETA: 0s - loss: nan - accuracy: 0.9996WARNING:tensorflow:Model was constructed with shape (None, 150) for input Tensor("embedding_6_input:0", shape=(None, 150), dtype=float32), but it was called on an input with incompatible shape (None, 1).
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 2/20
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 3/20
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 4/20
1970/1970 [==============================] - 8s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 5/20
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 6/20
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 7/20
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 8/20
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 9/20
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 10/20
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 11/20
1970/1970 [==============================] - 8s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 12/20
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 13/20
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 14/20
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 15/20
1970/1970 [==============================] - 8s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 16/20
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 17/20
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 18/20
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 19/20
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
Epoch 20/20
1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996

I think it has to do whit the input/output shape but I'm not certain.我认为它必须与输入/输出形状有关，但我不确定。 I tried modifying the model in various ways, adding layers/ removing layers/ different optimizers/ different batch sizes and nothing worked so far.我尝试以各种方式修改 model，添加层/删除层/不同的优化器/不同的批量大小，到目前为止没有任何效果。

Answer 1

Ok so, here is what I understood, correct me if I'm wrong:好的，这就是我的理解，如果我错了，请纠正我：

x contains 94556 integers, each being the index of one out of 2557 words. x包含 94556 个整数，每个整数是 2557 个单词中的一个的索引。
y contains 94556 vectors of 2557 integers, each containing also the index of one word, but this time it is a one-hot encoding instead of a categorical encoding. y包含 2557 个整数的 94556 个向量，每个向量还包含一个单词的索引，但这次是 one-hot 编码而不是分类编码。
Finally, a corresponding pair of words from x and y represents two words that are close by in the original text.最后，来自x和y的对应词对表示原始文本中靠近的两个词。

If I am correct so far, then the following runs correctly:如果到目前为止我是正确的，那么以下运行正确：

import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import *
from tensorflow.keras.models import *

x = np.random.randint(0,2557,94556)
y = np.eye((2557))[np.random.randint(0,2557,94556)]
xr = x.reshape((-1,1))


print("x.shape: {}\nxr.shape:{}\ny.shape: {}".format(x.shape, xr.shape, y.shape))


model = Sequential()
model.add(Embedding(2557, 64, input_length=1, embeddings_initializer='glorot_uniform'))
model.add(Reshape((64,)))
model.add(Dense(512, activation='sigmoid'))
model.add(Dense(2557, activation='softmax'))

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.summary()

history=model.fit(xr, y, epochs=20, batch_size=32, validation_split=3/9)

The most import modifications:最重要的修改：

The y reshaping was losing the relationship between elements from x and y . y重塑正在失去x和y元素之间的关系。
The input_length in the Embedding layer should correspond to the second dimension of xr . Embedding层中的input_length应该对应于xr的第二个维度。
The output of the last layer from the network should be the same dimension as the second dimension of y .网络最后一层的 output 应该与y的第二维相同。

I am actually surprised the code ran without crashing.实际上，我很惊讶代码运行时没有崩溃。

Finally, from my research, it seems that people are not training skipgrams like this in practice, but rather they are trying to predict whether a training example is correct (the two words are close by) or not.最后，根据我的研究，似乎人们在实践中并没有像这样训练skipgram，而是试图预测一个训练示例是否正确（这两个词很接近）。 Maybe this is the reason you came up with an output of dimension one.也许这就是您想出一维 output 的原因。

Here is a model inspired from https://github.com/PacktPublishing/Deep-Learning-with-Keras/blob/master/Chapter05/keras_skipgram.py :这是一个 model 的灵感来自https://github.com/PacktPublishing/Deep-Learning-with-Keras/blob/master/Chapter05/keras_skipgram.py ：

word_model = Sequential()
word_model.add(Embedding(2557, 64, embeddings_initializer="glorot_uniform", input_length=1))
word_model.add(Reshape((embed_size,)))

context_model = Sequential()
context_model.add(Embedding(2557, 64, embeddings_initializer="glorot_uniform", input_length=1))
context_model.add(Reshape((64,)))

model = Sequential()
model.add(Merge([word_model, context_model], mode="dot", dot_axes=0))
model.add(Dense(1, kernel_initializer="glorot_uniform", activation="sigmoid"))

In that case, you would have 3 vectors, all from the same size (94556, 1) (or probably even bigger than 94556, since you might have to generate additional negative samples):在这种情况下，您将拥有 3 个向量，它们的大小都相同(94556, 1) （甚至可能大于 94556，因为您可能需要生成额外的负样本）：

x containing integers from 0 to 2556 x包含从 0 到 2556 的整数
y containing integers from 0 to 2556 y包含从 0 到 2556 的整数
output containing 0s and 1s, whether each pair from x and y is a negative or a positive example output包含 0 和 1，无论来自x和y的每一对是负例还是正例

and the training would look like:培训看起来像：

history = model.fit([x, y], output, epochs=20, batch_size=32, validation_split=3/9)

警告：警告：tensorflow:Model 是用形状（无，150）构造的，但它是在形状不兼容的输入上调用的（无，1）

问题描述

1 个解决方案

解决方案1
8 已采纳 2020-05-07 15:26:21

警告：警告：tensorflow:Model 是用形状（无，150）构造的，但它是在形状不兼容的输入上调用的（无，1）

问题描述

1 个解决方案

解决方案1 8 已采纳 2020-05-07 15:26:21

解决方案1
8 已采纳 2020-05-07 15:26:21