简体   繁体   English

关于Keras嵌入层的输入形状的困惑

[英]Confusion about input shape for Keras Embedding layer

I'm trying to use the Keras embedding layer to create my own CBoW implementation to see how it works. 我正在尝试使用Keras嵌入层创建自己的CBoW实现,以了解其工作原理。

I've generated outputs represented by a vector of the context word I'm searching for with size equal to my vocab. 我已经生成了由我正在搜索的上下文词向量表示的输出,大小等于我的vocab。 I've also generated inputs so that each context word has X many nearby words represented by their one-hot encoded vectors. 我还生成了输入,以便每个上下文单词都有X个由其一键编码矢量表示的附近单词。

So for example if my sentence is: 例如,如果我的句子是:

"I ran over the fence to find my dog" “我跑过篱笆找我的狗”

using window size 2, I could generate the following input/output: 使用窗口大小2,我可以生成以下输入/输出:

[[over, the, to, find], fence] where 'fence' is my context word, 'over', 'the', 'to', 'find' are my nearby words with window 2 (2 in front, 2 in back). [[over,the,to,find],fence]其中“ fence”是我的上下文词,“ over”,“ the”,“ to”,“ find”是我附近的带有窗口2的词(前面2个,前面2在后面)。

Using sample vocab size of 500 and 100 training samples, after one-hot encoding my input and output, it would have the following dimensions: 使用500和100个训练样本的vocab样本大小,对输入和输出进行一键编码后,它将具有以下尺寸:

y.shape -> (100,500)
X.shape -> (100,4,500)

That is, I have 100 outputs each represented by a 500-sized vector. 也就是说,我有100个输出,每个输出由500大小的矢量表示。 I have 100 inputs each represented by a series of 4 500-sized vectors. 我有100个输入,每个输入由一系列4 500个大小的向量表示。

I have a simple model defined as: 我有一个简单的模型定义为:

model = Sequential()
model.add(Embedding(input_dim=vocabulary_size, output_dim=embedding_size, input_length=2*window_size))
#take average of context words at hidden layer
model.add(Lambda(lambda x: K.mean(x, axis = 1), output_shape=(embedding_size,)))
model.add(Dense(vocabulary_size, activation='softmax'))
model.compile(loss = 'categorical_crossentropy', optimizer = 'adam')

However, when I try to fit my model, I get a dimensional exception: 但是,当我尝试拟合模型时,出现了尺寸异常:

model.fit(X, y, batch_size=10, epochs=2, verbose=1)
ValueError: Error when checking input: expected embedding_6_input to have 2 dimensions, but got array with shape (100, 4, 500)

Now, I can only assume I'm using the embedding layer wrongly. 现在,我只能假设我错误地使用了嵌入层。 I've read both this CrossValidated Question and the Keras documentation . 我已经阅读了这个CrossValidated问题Keras 文档

I'm still not sure exactly how the inputs of this embedding layer works. 我仍然不确定这个嵌入层的输入是如何工作的。 I'm fairly certain my input_dim and output_dim are correct, which leaves input_length . 我相当确定我的input_dimoutput_dim是正确的,从而input_length According to the CrossValidated, my input_length is the length of my sequence. 根据CrossValidated,我的input_length是我的序列的长度。 According to Keras, my input should be of dimension (batch_size, input_length) . 根据(batch_size, input_length)说法,我的输入应具有尺寸(batch_size, input_length)

If my inputs are 4 words each represented by a word vector of size vocab_size , how do I input this to the model? 如果我的输入是4个单词,每个单词都由一个大小为vocab_size的单词向量表示,该如何将其输入到模型中?

The problem is that you are thinking about the embedding layer in a wrong way. 问题是您在错误地考虑嵌入层。 An Embedding layer is just a trainable look-up table: you give it an integer , which is the index of the word in the vocabulary, and it returns the word-vector (ie word embedding) of the given index. 嵌入层只是一个可训练的查询表:您给它一个整数 ,它是词汇表中单词的索引,并且它返回给定索引的单词向量(即单词嵌入)。 Therefore, its input must be the indices of the words in a sentence. 因此,其输入必须是句子中单词的索引。

As an example, if the indices of the words "over", "the", "to" and "find" are 43, 6, 9 and 33 respectively, then the input of the Embedding layer would be an array of those indices, ie [43, 6, 9, 33] . 例如,如果单词“ over”,“ the”,“ to”和“ find”的索引分别为43、6、9和33,则Embedding层的输入将是这些索引的数组,即[43, 6, 9, 33] Therefore, the training data must have a shape of (num_samples, num_words_in_a_sentence) . 因此,训练数据的形状必须为(num_samples, num_words_in_a_sentence) In your case, it would be (100, 4) . 在您的情况下,它将是(100, 4) In other words, you don't need to one-hot encode the words for the input data. 换句话说,您不需要对输入数据的单词进行一次热编码。 You can also use word indices as the labels as well if you use sparse_categorical_crossentropy as the loss function instead. 如果使用sparse_categorical_crossentropy作为损失函数,也可以将单词索引用作标签。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM