I am using Keras.Backend.armax()
in a gamma layer. The model compiles fine but throws an error during fit().
ValueError: An operation has `None` for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.
My model:
latent_dim = 512
encoder_inputs = Input(shape=(train_data.shape[1],))
encoder_dense = Dense(vocabulary, activation='softmax')
encoder_outputs = Embedding(vocabulary, latent_dim)(encoder_inputs)
encoder_outputs = LSTM(latent_dim, return_sequences=True)(encoder_outputs)
encoder_outputs = Dropout(0.5)(encoder_outputs)
encoder_outputs = encoder_dense(encoder_outputs)
encoder_outputs = Lambda(K.argmax, arguments={'axis':-1})(encoder_outputs)
encoder_outputs = Lambda(K.cast, arguments={'dtype':'float32'})(encoder_outputs)
encoder_dense1 = Dense(train_label.shape[1], activation='softmax')
decoder_embedding = Embedding(vocabulary, latent_dim)
decoder_lstm1 = LSTM(latent_dim, return_sequences=True)
decoder_lstm2 = LSTM(latent_dim, return_sequences=True)
decoder_dense2 = Dense(vocabulary, activation='softmax')
decoder_outputs = encoder_dense1(encoder_outputs)
decoder_outputs = decoder_embedding(decoder_outputs)
decoder_outputs = decoder_lstm1(decoder_outputs)
decoder_outputs = decoder_lstm2(decoder_outputs)
decoder_outputs = Dropout(0.5)(decoder_outputs)
decoder_outputs = decoder_dense2(decoder_outputs)
model = Model(encoder_inputs, decoder_outputs)
model.summary()
Model summary for easy visualizing:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_7 (InputLayer) (None, 32) 0
_________________________________________________________________
embedding_13 (Embedding) (None, 32, 512) 2018816
_________________________________________________________________
lstm_19 (LSTM) (None, 32, 512) 2099200
_________________________________________________________________
dropout_10 (Dropout) (None, 32, 512) 0
_________________________________________________________________
dense_19 (Dense) (None, 32, 3943) 2022759
_________________________________________________________________
lambda_5 (Lambda) (None, 32) 0
_________________________________________________________________
lambda_6 (Lambda) (None, 32) 0
_________________________________________________________________
dense_20 (Dense) (None, 501) 16533
_________________________________________________________________
embedding_14 (Embedding) (None, 501, 512) 2018816
_________________________________________________________________
lstm_20 (LSTM) (None, 501, 512) 2099200
_________________________________________________________________
lstm_21 (LSTM) (None, 501, 512) 2099200
_________________________________________________________________
dropout_11 (Dropout) (None, 501, 512) 0
_________________________________________________________________
dense_21 (Dense) (None, 501, 3943) 2022759
=================================================================
Total params: 14,397,283
Trainable params: 14,397,283
Non-trainable params: 0
_________________________________________________________________
I googled for the solution but almost all were about a faulty model. Some recommended to not use functions causing that are causing issues. However, as you can see, I cannot create this model without K.argmax (If you know any other way then do tell me). How do I solve this issue and hence train my model?
For obvious reasons there is no gradient for the Argmax function; How would that even be defined? In order for your model to work, you need to make the layer non-trainable. As per this question (or the documentation ), you need to pass trainable = False
to your layer. As for the layer weight (if applicable), you probably want to set it to an identity matrix.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.