简体   繁体   English

Keras LSTM输入层形状与实际输入不同

[英]Keras LSTM Input layer shape differs from actual input

Given that I'm not very experienced with this, the following may well be a silly question (and the title equally beside the point, any suggestions for modification are welcome). 鉴于我对此不太有经验,以下内容很可能是一个愚蠢的问题(标题也很重要,欢迎提出任何修改建议)。 I'm trying to get a Keras Model to work with multiple inputs, but keep running into problems with the input dimension(s). 我正在尝试使Keras模型能够与多个输入一起使用,但始终会遇到输入维的问题。 Quite possibly the setup of my network makes only little sense, but I first would like to produce something that works (ie executes) and then experiment with different setups. 我的网络设置很可能没有什么意义,但是我首先想生产出可以运行(即执行)的东西,然后尝试不同的设置。 Here's what I have now: 这是我现在所拥有的:

sent = Input(shape=(None,inputdim))
pos = Input(shape=(None,1))

l1 = LSTM(40)(sent)
l2 = LSTM(40)(pos)
out = concatenate([l1, l2])
output = Dense(1, activation='sigmoid')(out)

model = Model(inputs=[sent, pos], outputs=output)
model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy'])
print('X1 shape:', np.shape(X1_train))
print('X1 Input shape:', np.shape(sent))
print('X2 shape:', np.shape(X2_train))
print('X2 Input shape:', np.shape(pos))

model.fit([X1_train, X2_train], Y_train, batch_size=1, epochs=nrEpochs)

This gets me the following output/error: 这使我得到以下输出/错误:

Using TensorFlow backend.
INFO: Starting iteration 1 of 1...
INFO: Starting with training of LSTM network.
X1 shape: (3065,)
X1 Input shape: (?, ?, 21900)
X2 shape: (3065, 1)
X2 Input shape: (?, ?, 1)
Traceback (most recent call last):
  ...
ValueError: Error when checking input: expected input_1 to have 3 dimensions, 
but got array with shape (3065, 1)

If I understand things correctly (which I'm not at all sure about :), Input basically converts the input to a tensor, adding a third dimension (in my case), but the input I feed the model when doing model.fit() is still two-dimensional. 如果我理解正确(我一点也不知道:), Input基本上会将输入转换为张量,并添加第三维(以我为例),但是在执行model.fit()仍然是二维的。 Any ideas on how to go about this are very welcome. 任何关于如何做到这一点的想法都非常欢迎。

You should understand better how LSTM work. 您应该更好地了解LSTM是如何工作的。 An LSTM (as all recurrent neural networks units as GRU and RNN) expect an input that is shaped as follow: batch, time_steps, token_dimensions. LSTM(作为所有递归神经网络单位,如GRU和RNN)期望输入的形状如下:batch,time_steps,token_dimensions。

  • The first dimension is the batch_size (ie the number of examples you want to feed together to the network (this speed up the training because they can be processed in parallel). 第一个维度是batch_size(即要一起馈送到网络的示例数(这可以加快训练速度,因为它们可以并行处理)。
  • The second dimension (time_steps) is the length of your sequence and it has to be fixed. 第二个维度(time_steps)是序列的长度,必须固定。 So for example is the longest sequence in your training data is 70 you might want to set time_steps = 70. If it is too long you can choose an arbitrary len and truncate your sentences. 例如,您的训练数据中最长的序列是70,则可能需要设置time_steps =70。如果太长,则可以选择任意len并截断句子。
  • The third dimension is the size of each word (token) in the embedding space or the size of your vocabulary if you are directly feeding one-hot representation of the word to the LSTM (I discourage you to do so!). 第三维是嵌入空间中每个单词(令牌)的大小,或者如果您直接向LSTM提供单词的即时表示(我不鼓励您这样做),那么您的词汇表的大小。

In case you don't know about embeddings and how to use them in Keras you can give a look here https://keras.io/layers/embeddings/ 如果您不了解嵌入以及如何在Keras中使用它们,可以在这里看看https://keras.io/layers/embeddings/

Just to give you an idea of how the code should look like i paste here how I modified your code to make it work: 只是为了让您了解代码的外观,我在此处粘贴了如何修改代码以使其正常工作:

sent = Input(shape=(time_steps,))
pos = Input(shape=(time_steps2,))
lstm_in = Embeddings(vocab_size, 300)(sent) #now you have batch x time_steps x 300 tensor
lstm_in2 = Embeddings(vocab_size2, 100)(pos) 
l1 = LSTM(40)(lstm_in)
l2 = LSTM(40)(lstm_in2)
out = concatenate([l1, l2])
output = Dense(1, activation='sigmoid')(out)

model = Model(inputs=[sent, pos], outputs=output)

Note that the two inputs can have different number of timesteps. 请注意,两个输入可以具有不同数量的时间步长。 If the second one has only one then pass it through a Dense layer and not an LSTM. 如果第二个只有一个,则将其通过密集层而不是LSTM。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM