关于Keras嵌入层的输入形状的困惑

Question

I'm trying to use the Keras embedding layer to create my own CBoW implementation to see how it works. 我正在尝试使用Keras嵌入层创建自己的CBoW实现，以了解其工作原理。

I've generated outputs represented by a vector of the context word I'm searching for with size equal to my vocab. 我已经生成了由我正在搜索的上下文词向量表示的输出，大小等于我的vocab。 I've also generated inputs so that each context word has X many nearby words represented by their one-hot encoded vectors. 我还生成了输入，以便每个上下文单词都有X个由其一键编码矢量表示的附近单词。

So for example if my sentence is: 例如，如果我的句子是：

"I ran over the fence to find my dog" “我跑过篱笆找我的狗”

using window size 2, I could generate the following input/output: 使用窗口大小2，我可以生成以下输入/输出：

[[over, the, to, find], fence] where 'fence' is my context word, 'over', 'the', 'to', 'find' are my nearby words with window 2 (2 in front, 2 in back). [[over，the，to，find]，fence]其中“ fence”是我的上下文词，“ over”，“ the”，“ to”，“ find”是我附近的带有窗口2的词（前面2个，前面2在后面）。

Using sample vocab size of 500 and 100 training samples, after one-hot encoding my input and output, it would have the following dimensions: 使用500和100个训练样本的vocab样本大小，对输入和输出进行一键编码后，它将具有以下尺寸：

y.shape -> (100,500)
X.shape -> (100,4,500)

That is, I have 100 outputs each represented by a 500-sized vector. 也就是说，我有100个输出，每个输出由500大小的矢量表示。 I have 100 inputs each represented by a series of 4 500-sized vectors. 我有100个输入，每个输入由一系列4 500个大小的向量表示。

I have a simple model defined as: 我有一个简单的模型定义为：

model = Sequential()
model.add(Embedding(input_dim=vocabulary_size, output_dim=embedding_size, input_length=2*window_size))
#take average of context words at hidden layer
model.add(Lambda(lambda x: K.mean(x, axis = 1), output_shape=(embedding_size,)))
model.add(Dense(vocabulary_size, activation='softmax'))
model.compile(loss = 'categorical_crossentropy', optimizer = 'adam')

However, when I try to fit my model, I get a dimensional exception: 但是，当我尝试拟合模型时，出现了尺寸异常：

model.fit(X, y, batch_size=10, epochs=2, verbose=1)
ValueError: Error when checking input: expected embedding_6_input to have 2 dimensions, but got array with shape (100, 4, 500)

Now, I can only assume I'm using the embedding layer wrongly. 现在，我只能假设我错误地使用了嵌入层。 I've read both this CrossValidated Question and the Keras documentation . 我已经阅读了这个CrossValidated问题和Keras 文档。

I'm still not sure exactly how the inputs of this embedding layer works. 我仍然不确定这个嵌入层的输入是如何工作的。 I'm fairly certain my input_dim and output_dim are correct, which leaves input_length . 我相当确定我的input_dim和output_dim是正确的，从而input_length 。 According to the CrossValidated, my input_length is the length of my sequence. 根据CrossValidated，我的input_length是我的序列的长度。 According to Keras, my input should be of dimension (batch_size, input_length) . 根据(batch_size, input_length)说法，我的输入应具有尺寸(batch_size, input_length) 。

If my inputs are 4 words each represented by a word vector of size vocab_size , how do I input this to the model? 如果我的输入是4个单词，每个单词都由一个大小为vocab_size的单词向量表示，该如何将其输入到模型中？

Answer 1

The problem is that you are thinking about the embedding layer in a wrong way. 问题是您在错误地考虑嵌入层。 An Embedding layer is just a trainable look-up table: you give it an integer , which is the index of the word in the vocabulary, and it returns the word-vector (ie word embedding) of the given index. 嵌入层只是一个可训练的查询表：您给它一个整数 ，它是词汇表中单词的索引，并且它返回给定索引的单词向量（即单词嵌入）。 Therefore, its input must be the indices of the words in a sentence. 因此，其输入必须是句子中单词的索引。

As an example, if the indices of the words "over", "the", "to" and "find" are 43, 6, 9 and 33 respectively, then the input of the Embedding layer would be an array of those indices, ie [43, 6, 9, 33] . 例如，如果单词“ over”，“ the”，“ to”和“ find”的索引分别为43、6、9和33，则Embedding层的输入将是这些索引的数组，即[43, 6, 9, 33] 。 Therefore, the training data must have a shape of (num_samples, num_words_in_a_sentence) . 因此，训练数据的形状必须为(num_samples, num_words_in_a_sentence) 。 In your case, it would be (100, 4) . 在您的情况下，它将是(100, 4) 。 In other words, you don't need to one-hot encode the words for the input data. 换句话说，您不需要对输入数据的单词进行一次热编码。 You can also use word indices as the labels as well if you use sparse_categorical_crossentropy as the loss function instead. 如果使用sparse_categorical_crossentropy作为损失函数，也可以将单词索引用作标签。

关于Keras嵌入层的输入形状的困惑

问题描述

1 个解决方案

解决方案1
2 2018-11-28 12:11:59

关于Keras嵌入层的输入形状的困惑

问题描述

1 个解决方案

解决方案1 2 2018-11-28 12:11:59

解决方案1
2 2018-11-28 12:11:59