简体   繁体   中英

keras lstm nmt model compiles but does not run — dimension size?

I'm working on what I hope will be a simple nmt translator in keras. Below is a link to some inspiring examples of seq2seq in keras.

https://machinelearningmastery.com/develop-encoder-decoder-model-sequence-sequence-prediction-keras/

I want a model that takes a vector of 300 as a word input and takes 25 of them at a time. This 25 number is the length of a sentence. Units is the 300 number and tokens_per_sentence is the 25 number. The code below only has the training model. I have ommitted the inference model. My model compiles, but when I run it with training data I get a dimension error. I have tried reshaping the output of dense_layer_b but I'm repeatedly told that the output of the operation needs to have the same size as the input. I'm using python3 and tensorflow as a backend. My keras is v2.1.4. My os is uubuntu.

an error message:

ValueError: Error when checking target: expected dense_layer_b to have 2 dimensions, but got array with shape (1, 25, 300)

some terminal output:

Tensor("lstm_1/while/Exit_2:0", shape=(?, 25), dtype=float32)
(?, 25) (?, 25) h c

some code:

def model_lstm():
    x_shape = (None,units)
    valid_word_a = Input(shape=x_shape)
    valid_word_b = Input(shape=x_shape)
    ### encoder for training ###
    lstm_a = LSTM(units=tokens_per_sentence, return_state=True)
    recurrent_a, lstm_a_h, lstm_a_c = lstm_a(valid_word_a)
    lstm_a_states = [lstm_a_h , lstm_a_c]
    print(lstm_a_h)
    ### decoder for training ###
    lstm_b = LSTM(units=tokens_per_sentence ,return_state=True)
    recurrent_b, inner_lstmb_h, inner_lstmb_c = lstm_b(valid_word_b, initial_state=lstm_a_states)
    print(inner_lstmb_h.shape, inner_lstmb_c.shape,'h c')
    dense_b = Dense(tokens_per_sentence  , activation='softmax', name='dense_layer_b')
    decoder_b = dense_b(recurrent_b)
    model = Model([valid_word_a,valid_word_b], decoder_b) 
    return model

I was hoping that the question marks in the terminal output could be replaced with my vector size when the code was actually used with data.

edit: I've been trying to work this out by switching around the dimensions in the code. I have updated the code and the error message. I still have basically the same problem. The Dense layer doesn't seem to work.

所以我认为我需要将lstm_alstm_b return_sequences都设置为True

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM