简体   繁体   English

在 Keras 中实现注意力机制

[英]Implementing Attention in Keras

I am trying to implement attention in keras over a simple lstm:我试图通过一个简单的 lstm 在 keras 中实现注意力:

model_2_input = Input(shape=(500,))
#model_2 = Conv1D(100, 10, activation='relu')(model_2_input)
model_2 = Dense(64, activation='sigmoid')(model_2_input)
model_2 = Dense(64, activation='sigmoid')(model_2)

model_1_input = Input(shape=(None, 2048))
model_1 = LSTM(64, dropout_U = 0.2, dropout_W = 0.2, return_sequences=True)(model_1_input)
model_1, state_h, state_c = LSTM(16, dropout_U = 0.2, dropout_W = 0.2, return_sequences=True, return_state=True)(model_1) # dropout_U = 0.2, dropout_W = 0.2,


#print(state_c.shape)
match = dot([model_1, state_h], axes=(0, 0))
match = Activation('softmax')(match)
match = dot([match, state_h], axes=(0, 0))
print(match.shape)

merged = concatenate([model_2, match], axis=1)
print(merged.shape)
merged = Dense(4, activation='softmax')(merged)
print(merged.shape)
model = Model(inputs=[model_2_input , model_1_input], outputs=merged)
adam = Adam()
model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['accuracy'])

I am getting the error in line:我收到了以下错误:

merged = concatenate([model_2, match], axis=1)

'Got inputs shapes: %s' % (input_shape)) ValueError: A Concatenate layer requires inputs with matching shapes except for the concat axis. '得到输入形状:%s' % (input_shape)) ValueError: Concatenate层要求输入具有匹配的形状( Concatenate轴除外)。 Got inputs shapes: [(None, 64), (16, 1)]得到输入形状:[(None, 64), (16, 1)]

The implementation is very simple, just take dot product of lstm output and with the hidden states and use it as weighing function to compute the hidden state itself.实现非常简单,只需取 lstm 输出和隐藏状态的点积,并将其用作权重函数来计算隐藏状态本身。

How to resolve the error?如何解决错误? Especially how to get the attention concept working?特别是如何让注意力概念发挥作用?

You can add a Reshape layer before concatenating to ensure compatibility.您可以在连接之前添加一个 Reshape 图层以确保兼容性。 see keras documentation here .请参阅此处的keras 文档。 Probably best to reshape the model_2 output (None, 64)可能最好重塑 model_2 输出(None, 64)

EDIT:编辑:

Essentially you need to add a Reshape layer with the target shape before concatenating:本质上,您需要在连接之前添加具有目标形状的 Reshape 图层:

model_2 = Reshape(new_shape)(model_2)

This will return (batch_size, (new_shape)) You can of course Reshape either branch of your network, just using model_2 output as it is a simpler example这将返回(batch_size, (new_shape))您当然可以重塑网络的任一分支,只需使用 model_2 输出,因为它是一个更简单的示例

Having said that, maybe it's worth rethinking your network structure.话虽如此,也许值得重新考虑您的网络结构。 In particular, this problem stems from the second dot layer (which gives you 16 scalars only).特别是,这个问题源于第二个点层(它只给你 16 个标量)。 As such it's hard to reshape so that the two branches match.因此,很难重塑以使两个分支匹配。

Without knowing what the model is trying to predict or what the training data looks like, it's hard to comment on whether two dots are necessary or not, but potentially re-structuring will solve this issue.在不知道模型试图预测什么或训练数据是什么样的情况下,很难评论两个点是否必要,但潜在的重组将解决这个问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM