简体   繁体   中英

ValueError: Input 0 is incompatible with layer batch_normalization_1: expected ndim=3, found ndim=2

I am trying to use the implementetion of DeepTriage which is a deep learning approach for bug triaging. This website includes dataset, source code and paper. I know that is a very specific area, but I'll try to make it simple.

In the source code they define their approach "DBRNN-A: Deep Bidirectional Recurrent Neural Network with Attention mechanism and with Long Short-Term Memory units (LSTM)" with this code part:

input = Input(shape=(max_sentence_len,), dtype='int32')
sequence_embed = Embedding(vocab_size, embed_size_word2vec, input_length=max_sentence_len)(input)

forwards_1 = LSTM(1024, return_sequences=True, dropout_U=0.2)(sequence_embed)
attention_1 = SoftAttentionConcat()(forwards_1)
after_dp_forward_5 = BatchNormalization()(attention_1)

backwards_1 = LSTM(1024, return_sequences=True, dropout_U=0.2, go_backwards=True)(sequence_embed)
attention_2 = SoftAttentionConcat()(backwards_1)
after_dp_backward_5 = BatchNormalization()(attention_2)

merged = merge([after_dp_forward_5, after_dp_backward_5], mode='concat', concat_axis=-1)
after_merge = Dense(1000, activation='relu')(merged)
after_dp = Dropout(0.4)(after_merge)
output = Dense(len(train_label), activation='softmax')(after_dp)                
model = Model(input=input, output=output)
model.compile(loss='categorical_crossentropy', optimizer=Adam(lr=1e-4), metrics=['accuracy']) 

SoftAttentionConcat implementation is from here . Rest of the functions are from keras . Also, in the paper they share the structure as:

DBRNN-A

In the first batch normalization line, it throws this error:

ValueError: Input 0 is incompatible with layer batch_normalization_1: expected ndim=3, found ndim=2

When I use max_sentence_len=50 and max_sentence_len=200 I look at the dimension until the error point, I see these shapes:

Input               -> (None, 50)
Embedding           -> (None, 50, 200)
LSTM                -> (None, None, 1024)
SoftAttentionConcat -> (None, 2048) 

So, is there anybody seeing the problem here?

I guess the problem is using TensorFlow code in a Keras structure or some version issues.

By using the question and the answers here , I implemented the attention mechanism in Keras as follows:

attention_1 = Dense(1, activation="tanh")(forwards_1)
attention_1 = Flatten()(attention_1)  # squeeze (None,50,1)->(None,50)
attention_1 = Activation("softmax")(attention_1)
attention_1 = RepeatVector(num_rnn_unit)(attention_1)
attention_1 = Permute([2, 1])(attention_1)
attention_1 = multiply([forwards_1, attention_1])
attention_1 = Lambda(lambda xin: K.sum(xin, axis=1), output_shape=(num_rnn_unit,))(attention_1)

last_out_1 = Lambda(lambda xin: xin[:, -1, :])(forwards_1)
sent_representation_1 = concatenate([last_out_1, attention_1])

This works quite well. All the source code that I used for the implementation is available is in GitHub .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM