[英]LSTM with self attention for multi class text classification
我在以下鏈接中關注 Keras 中的自我關注How to add attention layer to a Bi-LSTM
我想將 BI LSTM 應用於 3 個類的多類文本分類。
我嘗試在我的代碼中應用注意力,但出現以下錯誤,我該如何解決這個問題? 有人可以幫我嗎?
Incompatible shapes: [100,3] vs. [64,3]
[[Node: training_1/Adam/gradients/loss_11/dense_14_loss/mul_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _class=["loc:@training_1/Adam/gradients/loss_11/dense_14_loss/mul_grad/Reshape_1"], _device="/job:localhost/replica:0/task:0/device:CPU:0"](training_1/Adam/gradients/loss_11/dense_14_loss/mul_grad/Shape, training_1/Adam/gradients/loss_11/dense_14_loss/mul_grad/Shape_1)]]
class attention(Layer):
def __init__(self, return_sequences=False):
self.return_sequences = return_sequences
super(attention,self).__init__()
def build(self, input_shape):
self.W=self.add_weight(name="att_weight", shape=(input_shape[-1],1),
initializer="normal")
self.b=self.add_weight(name="att_bias", shape=(input_shape[1],1),
initializer="zeros")
super(attention,self).build(input_shape)
def call(self, x):
e = K.tanh(K.dot(x,self.W)+self.b)
a = K.softmax(e, axis=1)
output = x*a
if self.return_sequences:
return output
return K.sum(output, axis=1)
model = Sequential()
model.add(Embedding(17666, 100, input_length=409))
model.add(Bidirectional(LSTM(32, return_sequences=False)))
model.add(attention(return_sequences=True)) # receive 3D and output 2D
model.add(Dropout(0.3))
model.add(Dense(3, activation='softmax'))
model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])
model.summary()
from keras.callbacks import EarlyStopping
es = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=3)
history777=model.fit(x_train, y_train,
batch_size=100,
epochs=30,
validation_data=(x_val, y_val),
callbacks=[es])
the model:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_14 (Embedding) (None, 409, 100) 1766600
_________________________________________________________________
bidirectional_14 (Bidirectio (None, 64) 34048
_________________________________________________________________
attention_14 (attention) (None, 64) 128
_________________________________________________________________
dropout_6 (Dropout) (None, 64) 0
_________________________________________________________________
dense_14 (Dense) (None, 3) 195
=================================================================
Total params: 1,800,971
Trainable params: 1,800,971
Non-trainable params: 0
____
注意你如何在 LSTM 和注意力層中設置 return_sequence 參數
您的輸出是 2D,所以最后一個返回序列必須設置為 False 而其他的必須設置為 True
你的模型必須是
model = Sequential()
model.add(Embedding(max_words, emb_dim, input_length=max_len))
model.add(Bidirectional(LSTM(32, return_sequences=True))) # return_sequences=True
model.add(attention(return_sequences=False)) # return_sequences=False
model.add(Dropout(0.3))
model.add(Dense(3, activation='softmax'))
這里是完整的例子: https : //colab.research.google.com/drive/13l5eAHS5uTUsdqyQNm1Dr4JEXg7Fl2Bo?usp=sharing
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.