简体   繁体   中英

Keras LSTM Input layer shape differs from actual input

Given that I'm not very experienced with this, the following may well be a silly question (and the title equally beside the point, any suggestions for modification are welcome). I'm trying to get a Keras Model to work with multiple inputs, but keep running into problems with the input dimension(s). Quite possibly the setup of my network makes only little sense, but I first would like to produce something that works (ie executes) and then experiment with different setups. Here's what I have now:

sent = Input(shape=(None,inputdim))
pos = Input(shape=(None,1))

l1 = LSTM(40)(sent)
l2 = LSTM(40)(pos)
out = concatenate([l1, l2])
output = Dense(1, activation='sigmoid')(out)

model = Model(inputs=[sent, pos], outputs=output)
model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy'])
print('X1 shape:', np.shape(X1_train))
print('X1 Input shape:', np.shape(sent))
print('X2 shape:', np.shape(X2_train))
print('X2 Input shape:', np.shape(pos))

model.fit([X1_train, X2_train], Y_train, batch_size=1, epochs=nrEpochs)

This gets me the following output/error:

Using TensorFlow backend.
INFO: Starting iteration 1 of 1...
INFO: Starting with training of LSTM network.
X1 shape: (3065,)
X1 Input shape: (?, ?, 21900)
X2 shape: (3065, 1)
X2 Input shape: (?, ?, 1)
Traceback (most recent call last):
  ...
ValueError: Error when checking input: expected input_1 to have 3 dimensions, 
but got array with shape (3065, 1)

If I understand things correctly (which I'm not at all sure about :), Input basically converts the input to a tensor, adding a third dimension (in my case), but the input I feed the model when doing model.fit() is still two-dimensional. Any ideas on how to go about this are very welcome.

You should understand better how LSTM work. An LSTM (as all recurrent neural networks units as GRU and RNN) expect an input that is shaped as follow: batch, time_steps, token_dimensions.

  • The first dimension is the batch_size (ie the number of examples you want to feed together to the network (this speed up the training because they can be processed in parallel).
  • The second dimension (time_steps) is the length of your sequence and it has to be fixed. So for example is the longest sequence in your training data is 70 you might want to set time_steps = 70. If it is too long you can choose an arbitrary len and truncate your sentences.
  • The third dimension is the size of each word (token) in the embedding space or the size of your vocabulary if you are directly feeding one-hot representation of the word to the LSTM (I discourage you to do so!).

In case you don't know about embeddings and how to use them in Keras you can give a look here https://keras.io/layers/embeddings/

Just to give you an idea of how the code should look like i paste here how I modified your code to make it work:

sent = Input(shape=(time_steps,))
pos = Input(shape=(time_steps2,))
lstm_in = Embeddings(vocab_size, 300)(sent) #now you have batch x time_steps x 300 tensor
lstm_in2 = Embeddings(vocab_size2, 100)(pos) 
l1 = LSTM(40)(lstm_in)
l2 = LSTM(40)(lstm_in2)
out = concatenate([l1, l2])
output = Dense(1, activation='sigmoid')(out)

model = Model(inputs=[sent, pos], outputs=output)

Note that the two inputs can have different number of timesteps. If the second one has only one then pass it through a Dense layer and not an LSTM.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM