如何将 2D 注意力掩码传递给 HuggingFace BertModel？

Question

I would like to pass a directional attention mask to BertModel.forward , so that I can control which surrounding tokens each token can see during self-attention.我想将定向注意力掩码传递给BertModel.forward ，以便我可以控制每个标记在自我注意期间可以看到哪些周围标记。 This matrix would have to be 2D.该矩阵必须是二维的。

Here is an example with three input ids, where the first two tokens cannot attend to the last one.这是一个包含三个输入 ID 的示例，其中前两个标记无法关注最后一个。 But the last one can attend to all tokens.但是最后一个可以处理所有令牌。

torch.tensor([
 [1, 1, 1]
 [1, 1, 1]
 [0, 0, 1]
])

Unfortunately, the documentation does not mention anything about supporting 2D attention masks (or rather 3D with batch dimension).不幸的是，该文档没有提及任何关于支持 2D 注意力掩码（或者更确切地说 3D 具有批次维度）的内容。 It's possible to pass a 3D attention mask, but in my experiments performance of the model did not change much, regardless of how the mask looked like.可以通过 3D 注意力掩码，但在我的实验中，无论掩码看起来如何，model 的性能都没有太大变化。

Is this possible, if so how?这可能吗，如果可能的话怎么办？

Answer 1

If you can provide more details it will more clear.如果你能提供更多细节，它会更清楚。 Any way this is my initila answer, to make things simple track in the implementation where the model uses the mask.无论如何，这是我的初始答案，以便在 model 使用掩码的实现中简化事情。 For example in this line : As you notice if you tracked the expand function here then you will find that this is your case in this line At this point you can decide what do you need later.例如在这一行中：正如您所注意到的，如果您在此处跟踪扩展 function，那么您会发现这就是您在这一行中的情况此时您可以决定稍后需要什么。

如何将 2D 注意力掩码传递给 HuggingFace BertModel？

问题描述

1 个解决方案

解决方案1
0 2023-01-30 17:11:37

如何将 2D 注意力掩码传递给 HuggingFace BertModel？

问题描述

1 个解决方案

解决方案1 0 2023-01-30 17:11:37

解决方案1
0 2023-01-30 17:11:37