简体   繁体   English

Keras的注意事项:如何在keras密集层中添加不同的注意机制?

[英]Attention in Keras : How to add different attention mechanism in keras Dense layer?

I am new in Keras and I am trying to build a simple autoencoder in keras with attention layers : 我是Keras的新手,我正在尝试使用注意层在keras中构建一个简单的自动编码器:

Here what I tried : 这是我尝试过的:

data = Input(shape=(w,), dtype=np.float32, name='input_da')
noisy_data = Dropout(rate=0.2, name='drop1')(data)

encoded = Dense(256, activation='relu',
            name='encoded1', **kwargs)(noisy_data)
encoded = Lambda(mvn, name='mvn1')(encoded)

encoded = Dense(128, activation='relu',
            name='encoded2', **kwargs)(encoded)

encoded = Lambda(mvn, name='mvn2')(encoded)
encoded = Dropout(rate=0.5, name='drop2')(encoded)


encoder = Model([data], encoded)
encoded1 = encoder.get_layer('encoded1')
encoded2 = encoder.get_layer('encoded2')


decoded = DenseTied(256, tie_to=encoded2, transpose=True,
            activation='relu', name='decoded2')(encoded)
decoded = Lambda(mvn, name='new_mv')(decoded)


decoded = DenseTied(w, tie_to=encoded1, transpose=True,
            activation='linear', name='decoded1')(decoded)

And it looks like this: 它看起来像这样:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
data (InputLayer)            (None, 2693)              0         
_________________________________________________________________
drop1 (Dropout)              (None, 2693)              0         
_________________________________________________________________
encoded1 (Dense)             (None, 256)               689664    
_________________________________________________________________
mvn1 (Lambda)                (None, 256)               0         
_________________________________________________________________
encoded2 (Dense)             (None, 128)               32896     
_________________________________________________________________
mvn2 (Lambda)                (None, 128)               0         
_________________________________________________________________
drop2 (Dropout)              (None, 128)               0         
_________________________________________________________________
decoded2 (DenseTied)         (None, 256)               256       
_________________________________________________________________
mvn3 (Lambda)                (None, 256)               0         
_________________________________________________________________
decoded1 (DenseTied)         (None, 2693)              2693      
=================================================================

Where I can add attention layer in this model? 我可以在这个模型中添加注意层? should I add after first encoded_output and before second encoded input? 我应该在第一次编码输出后和第二次编码输入之前添加吗?

encoded = Lambda(mvn, name='mvn1')(encoded)

    Here?

encoded = Dense(128, activation='relu',
            name='encoded2', **kwargs)(encoded)

also I was going though this beautiful lib : 我也想去看看这个美丽的书:

https://github.com/CyberZHG/keras-self-attention https://github.com/Cyber​​ZHG/keras-self-attention

They have implemented various types of attention mechanisms but it's for sequential models. 他们已经实现了各种类型的注意机制,但它适用于顺序模型。 How I can add those attention in my model? 我如何在模型中添加这些注意力?

I tried with very simple attention : 我试着非常简单地注意:

encoded = Dense(256, activation='relu',
        name='encoded1', **kwargs)(noisy_data)


encoded = Lambda(mvn, name='mvn1')(encoded)

attention_probs = Dense(256, activation='softmax', name='attention_vec')(encoded)
attention_mul = multiply([encoded, attention_probs], name='attention_mul')
attention_mul = Dense(256)(attention_mul)

print(attention_mul.shape)

encoded = Dense(128, activation='relu',
        name='encoded2', **kwargs)(attention_mul)

is it at right place and can I add any other attention mechanism with this model? 它是在正确的位置,我可以添加任何其他注意机制与此模型?

I guess what you're doing is a correct way of adding attention, because attention in itself is nothing but can be visualized as weights of a dense layer. 我猜你正在做的是一种增加注意力的正确方法,因为注意力本身就不过是可以被视为一个密集层的权重。 Also, I guess applying attention just after encoder is the right thing to do, as it will apply attention to the most "informative" part of the data distribution necessary for your task. 此外,我想在编码器之后应用注意力是正确的事情,因为它会将注意力集中在您的任务所需的数据分布的“信息性”部分。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM