简体   繁体   English

如何为每个句子在 bert model 的 output 上做 avg pool?

[英]how to do avg pool on the output of bert model for each sentence?

for classification, we usually use [CLS] to predict labels.对于分类,我们通常使用 [CLS] 来预测标签。 but now i have another request to do avg-pooling on the output of each sentence in bert model.但现在我有另一个请求对 bert model 中每个句子的 output 进行平均池化。 it seems a little bit hard for me?对我来说似乎有点难? sentence is split by [SEP] but lengh of each sentence in each sample of a batch is not equal, so tf.split is not fit for this problem?句子被 [SEP] 分割,但批次的每个样本中每个句子的长度不相等,所以 tf.split 不适合这个问题?

an example as follows(batch_size=2), how to get the avg-pooling of each sentences?一个例子如下(batch_size=2),如何得到每个句子的 avg-pooling?

[CLS] w1 w2 w3 [sep] w4 w5 [sep] [CLS] w1 w2 w3 [sep] w4 w5 [sep]

[CLS] x1 x2 [sep] x3 w4 x5 [sep] [CLS] x1 x2 [sep] x3 w4 x5 [sep]

You can get the averages by masking.您可以通过掩码获得平均值。

If you call encode_plus on the tokenizer and set return_token_type_ids to True , you will get a dictionary that contains:如果您在标记器上调用encode_plus并将return_token_type_ids设置为True ,您将获得一个包含以下内容的字典:

  • 'input_ids' : token indices that you pass into your model 'input_ids' :传递给 model 的令牌索引
  • 'token_type_ids' : a list of 0s and 1s that says which token belongs to which input sentence. 'token_type_ids' :一个 0 和 1 的列表,表示哪个标记属于哪个输入句子。

Assuming you batched the token_type_ids , such that 0s are the first sentence, 1s are the second sentence and padding is something else (like -1) in a tensor in variable mask with shape batch × length , and you have the BERT output in a tensor in variable output of shape batch × length × 768, you can do:假设您对token_type_ids进行了批处理,其中 0 是第一句,1 是第二句,填充是形状为batch × length的变量mask中的张量中的其他内容(如-1),并且您在张量中有 BERT output在形状×长度×768的变量output中,你可以这样做:

first_sent_mask  = tf.cast(mask == 0, tf.float32)
first_sent_lens = tf.reduce_sum(first_sent_mask, axis=1, keepdims=True)
first_sent_mean = (
    tf.reduce_sum(output * tf.expand_dims(first_sent_mask, 2)) /
    first_sent_lens)
second_sent_mask = tf.cast(mask == 1, tf.float32)
...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用预训练的 BERT 模型进行下一句标注? - How to use pre-trained BERT model for next sentence labeling? 如何将伯特 model output 转换为 json? - How to convert bert model output to json? 如何在tensorflow中使用transformers bert保存每个时期的最佳模型 - How to save the best model of each epoch with transformers bert in tensorflow 如何使用 BERT model 预测与没有 label 的数据集的句子语义相似度? - How can I use BERT model to predict sentence semantic similarity to a dataset with no label? 如何让BERT模型收敛? - How to make BERT model converge? 如何在 BERT 模型上应用修剪? - How to apply pruning on a BERT model? 如何在 HuggingFace Transformers 库中获取预训练的 BERT model 的中间层 output? - How to get intermediate layers' output of pre-trained BERT model in HuggingFace Transformers library? BERT 层序列输出如何使用? - How is BERT Layer sequence output used? 如何在其他地方使用 bert 预训练的 model? - How to use a bert pretrained model somewhere else? 如何从 Huggingface Transformers for Sequence Classification 和 tensorflow 中解释我的 BERT output? - How do I interpret my BERT output from Huggingface Transformers for Sequence Classification and tensorflow?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM