简体繁体 English

删除 Bert 中的 SEP 令牌以进行文本分类

[英]Removing SEP token in Bert for text classification

原文 2020-01-13 15:15:36 0 2 python/ bert-language-model

Given a sentiment classification dataset, I want to fine-tune Bert.给定一个情感分类数据集，我想微调 Bert。

As you know that BERT created to predict the next sentence given the current sentence.如您所知，BERT 创建是为了在给定当前句子的情况下预测下一个句子。 Thus, to make the network aware of this, they inserted a [CLS] token in the beginning of the first sentence then they add [SEP] token to separate the first from the second sentence and finally another [SEP] at the end of the second sentence (it's not clear to me why they append another token at the end).因此，为了让网络意识到这一点，他们在第一个句子的开头插入了一个[CLS]令牌，然后他们添加了[SEP]令牌以将第一个和第二个句子分开，最后在末尾添加另一个[SEP]第二句话（我不清楚他们为什么在最后附加另一个标记）。

Anyway, for text classification, what I noticed in some of the examples online (see BERT in Keras with Tensorflow hub ) is that they add [CLS] token and then the sentence and at the end another [SEP] token.无论如何，对于文本分类，我在一些在线示例中注意到（参见Keras 中的 BERT 和 Tensorflow hub ）是他们添加了[CLS]标记，然后是句子，最后是另一个[SEP]标记。

Where in other research works (eg Enriching Pre-trained Language Model with Entity Information for Relation Classification ) they remove the last [SEP] token.在其他研究工作中（例如，使用实体信息为关系分类丰富预训练语言模型），他们删除了最后一个[SEP]标记。

Why is it/not beneficial to add the [SEP] token at the end of the input text when my task uses only single sentence?当我的任务仅使用单个句子时，为什么在输入文本的末尾添加[SEP]标记会/无益？

2 个解决方案

Im not quite sure why BERT needs the separation token [SEP] at the end for single-sentence tasks, but my guess is that BERT is an autoencoding model that, as mentioned, originally was designed for Language Modelling and Next Sentence Prediction.我不太确定为什么 BERT 在单句任务的末尾需要分离令牌[SEP] ，但我猜测 BERT 是一种自动编码模型，如前所述，最初是为语言建模和下一句预测而设计的。 So BERT was trained that way to always expect the [SEP] token, which means that the token is involved in the underlying knowledge that BERT built up during training.因此，BERT 以这种方式进行训练，以始终期待[SEP]令牌，这意味着令牌涉及 BERT 在训练期间建立的基础知识。

Downstream tasks that followed later, such as single-sentence use-cases (eg text classification), turned out to work too with BERT, however the [SEP] was left as a relict for BERT to work properly and thus is needed even for these tasks.后来的下游任务，例如单句用例（例如文本分类），结果证明也适用于 BERT，但是[SEP]是 BERT 正常工作的遗物，因此即使对于这些任务也需要任务。

BERT might learn faster, if [SEP] is appended at the end of a single sentence, because it encodes somewhat of a knowledge in that token, that this marks the end of the input.如果将[SEP]附加在单个句子的末尾，BERT 可能学得更快，因为它在该标记中编码了一些知识，这标志着输入的结束。 Without it, BERT would still know where the sentence ends (due to the padding tokens), which explains that fore mentioned research leaves away the token, but this might slow down training slightly, since BERT might be able to learn faster with appended [SEP] token, especially if there are no padding tokens in a truncated input.没有它，BERT 仍然会知道句子结束的位置（由于填充标记），这解释了前面提到的研究遗漏了标记，但这可能会稍微减慢训练速度，因为 BERT 可能能够通过附加的[SEP]标记，尤其是在截断的输入中没有填充标记的情况下。

As mentioned in BERT's paper , BERT is pre-trained using two novel unsupervised prediction tasks: Masked Language Model and Next Sentence Prediction.正如BERT 的论文中提到的，BERT 是使用两个新颖的无监督预测任务进行预训练的：Masked Language Model 和 Next Sentence Prediction。 In Next Sentence Prediction task, the model takes a pair of sentences as input and learns to predict whether the second sentence is the next sequence in original document or not.在 Next Sentence Prediction 任务中，模型将一对句子作为输入，并学习预测第二个句子是否是原始文档中的下一个序列。

Accordingly, I think the BERT model uses the relationship between two text sentences in text classification task as well as other tasks.因此，我认为 BERT 模型在文本分类任务以及其他任务中使用了两个文本句子之间的关系。 This relationship can be used to predict if these two sentences belong to the same class or not.这种关系可用于预测这两个句子是否属于同一类。 Therefore, the [SEP] token is needed to merge these two sentences and determine the relationship between them.因此，需要[SEP]令牌来合并这两个句子并确定它们之间的关系。