如何在 MLM 任务上训练 Tensorflow 的预训练 BERT？（仅在 Tensorflow 中使用预训练的 model）

Question

The bounty expires in 5 days . 赏金将在 5 天后到期。 Answers to this question are eligible for a +50 reputation bounty. 此问题的答案有资格获得+50声望赏金。 Deshwal is looking for a canonical answer : Deshwal正在寻找一个规范的答案：

There is no official documentation I found which says how can you train MLM and then fine tune as Classification model in Tensorflow. 我发现没有官方文档说明如何训练MLM ，然后将其微调为 Tensorflow 中的分类 model。 One way could be training MLM as Pytorch model and loading the same as Tensorflow model. 一种方法是将MLM训练为 Pytorch model 并加载与 Tensorflow Z20F35E630DAF44DBFA4C3F68F5Z3 相同的内容。 Smallest help is appreciated on the topic. 对该主题的最小帮助表示赞赏。 Be it training in Pytorch and loading in Tensorflow or Doing everything in Tensorflow. 无论是在 Pytorch 中训练并在 Tensorflow 中加载，还是在 Tensorflow 中进行所有操作。

Before Anyone suggests pytorch and other things, I am looking specifically for Tensorflow + pretrained + MLM task only.在有人建议pytorch和其他东西之前，我专门寻找Tensorflow +预训练+传销任务。 I know, there are lots of blogs for PyTorch and lots of blogs for fine tuning ( Classification) on Tensorflow.我知道，有很多关于PyTorch的博客和很多关于 Tensorflow 的微调（分类）博客。

Coming to the problem, I got a language model which is English + LaTex where a text data can represent any text from Physics, Chemistry, MAths and Biology and any typical example can look something like this: Link to OCR image遇到问题，我得到了一种语言 model 即英语 + LaTex，其中文本数据可以表示物理、化学、数学和生物学中的任何文本，任何典型示例都可以看起来像这样：链接到 OCR 图像

"Find the value of function x in the equation: \n \\( f(x)=\\left\\{\\begin{array}{ll}x^{2} & \\text { if } x<0 \\\\ 2 x & \\text { if } x \\geq 0\\end{array}\\right. \\)"

So my language model needs to understand \geq \\begin array \eng \left \right other than the English language and that is why I need to train an MLM first on pre-trained BERT or SciBERT to have both.所以我的语言 model 需要理解\geq \\begin array \eng \left \right而不是英语，这就是为什么我需要先在预训练的BERT或SciBERT上训练MLM以同时具备两者。 So I went up digging the internet and found some tutorials:所以我去挖掘互联网并找到了一些教程：

I already have a fine tuning classification model.我已经有了一个微调分类 model。 Some of the code is as follows:部分代码如下：

tokenizer = transformers.BertTokenizer.from_pretrained('bert-large-uncased')

def regular_encode(texts, tokenizer, maxlen=maxlen):
  enc_di = tokenizer.batch_encode_plus(texts,  return_token_type_ids=False,padding='max_length',max_length=maxlen,truncation = True,)
  return np.array(enc_di['input_ids'])

Xtrain_encoded = regular_encode(X_train.astype('str'), tokenizer, maxlen=maxlen)
ytrain_encoded = tf.keras.utils.to_categorical(y_train, num_classes=classes,dtype = 'int32')

def build_model(transformer, loss='categorical_crossentropy', max_len=maxlen, dense = 512, drop1 = 0.3, drop2 = 0.3):
    input_word_ids = tf.keras.layers.Input(shape=(max_len,), dtype=tf.int32, name="input_word_ids")
    sequence_output = transformer(input_word_ids)[0]
    cls_token = sequence_output[:, 0, :]

    #Fine Tuning Model Start
    x = tf.keras.layers.Dropout(drop1)(cls_token)  
    x = tf.keras.layers.Dense(512, activation='relu')(x)
    x = tf.keras.layers.Dropout(drop2)(x)
    out = tf.keras.layers.Dense(classes, activation='softmax')(x)
    model = tf.keras.Model(inputs=input_word_ids, outputs=out)
    return model

Only useful thing I could get was in the HuggingFace that我能得到的唯一有用的东西是在HuggingFace中

With the tight interoperability between TensorFlow and PyTorch models, you can even save the model and then reload it as a PyTorch model (or vice-versa) With the tight interoperability between TensorFlow and PyTorch models, you can even save the model and then reload it as a PyTorch model (or vice-versa)

from transformers import AutoModelForSequenceClassification

model.save_pretrained("my_imdb_model")
pytorch_model = AutoModelForSequenceClassification.from_pretrained("my_imdb_model", from_tf=True)

So maybe I could train a pytorch MLM and then load it as a tensorflow fine tuned classification model?所以也许我可以训练一个pytorch MLM ，然后将其加载为tensorflow微调分类 model？ Is there any other way?还有其他方法吗？

Answer 1

I also encountered this problem, and did some research.我也遇到了这个问题，做了一些研究。 A possible solution is to use tensorflow_hub models supporting mlm.一种可能的解决方案是使用支持 mlm 的 tensorflow_hub 模型。

For example, https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-10_H-128_A-2/2 this model has a relatively complete tutorial on how to create a mlm task.比如https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-10_H-128_A-2/2这个model有一个比较完整的教程，介绍如何创建mlm任务。 But there is a problem if you are designing multitasks.但是，如果您正在设计多任务，就会出现问题。 Because:因为：

import tensorflow_hub as hub

encoder = hub.load(model_path)
mlm = hub.KerasLayer(encoder.mlm, trainable=True)

I'm not very sure this can help in multitasks because I dont know whether encoder is updated.我不太确定这对多任务有帮助，因为我不知道编码器是否已更新。 Hope this could be useful and maybe someone could also help me out.希望这可能有用，也许有人也可以帮助我。 Thanks.谢谢。

Some Updates: I have checked the official document for tfhub, and I guess I have mistaken what the encoder means in this model.一些更新：我查看了tfhub 的官方文档，我想我弄错了这个model中编码器的含义。 The weights are updated in the encoder and this actually can serve multitask pretraining, will update if any result goes wrong in my experiment.权重在编码器中更新，这实际上可以用于多任务预训练，如果在我的实验中出现任何结果错误，它将更新。

Answer 2

I think after some research, I found something.我想经过一番研究，我发现了一些东西。 I don't know if it'll work but it uses transformer and Tensorflow with XLM .我不知道它是否会工作，但它使用transformer和Tensorflow和XLM 。 I think it'll work for BERT too.我认为它也适用于BERT 。

PRETRAINED_MODEL = 'jplu/tf-xlm-roberta-large'
from tensorflow.keras.optimizers import Adam
import transformers
from transformers import TFAutoModelWithLMHead, AutoTokenizer

def create_mlm_model_and_optimizer():
    with strategy.scope():
        model = TFAutoModelWithLMHead.from_pretrained(PRETRAINED_MODEL)
        optimizer = tf.keras.optimizers.Adam(learning_rate=LR)
    return model, optimizer


mlm_model, optimizer = create_mlm_model_and_optimizer()




def define_mlm_loss_and_metrics():
    with strategy.scope():
        mlm_loss_object = masked_sparse_categorical_crossentropy

        def compute_mlm_loss(labels, predictions):
            per_example_loss = mlm_loss_object(labels, predictions)
            loss = tf.nn.compute_average_loss(
                per_example_loss, global_batch_size = global_batch_size)
            return loss

        train_mlm_loss_metric = tf.keras.metrics.Mean()
        
    return compute_mlm_loss, train_mlm_loss_metric


def masked_sparse_categorical_crossentropy(y_true, y_pred):
    y_true_masked = tf.boolean_mask(y_true, tf.not_equal(y_true, -1))
    y_pred_masked = tf.boolean_mask(y_pred, tf.not_equal(y_true, -1))
    loss = tf.keras.losses.sparse_categorical_crossentropy(y_true_masked,
                                                          y_pred_masked,
                                                          from_logits=True)
    return loss

            
            
def train_mlm(train_dist_dataset, total_steps=2000, evaluate_every=200):
    step = 0
    ### Training lopp ###
    for tensor in train_dist_dataset:
        distributed_mlm_train_step(tensor) 
        step+=1

        if (step % evaluate_every == 0):   
            ### Print train metrics ###  
            train_metric = train_mlm_loss_metric.result().numpy()
            print("Step %d, train loss: %.2f" % (step, train_metric))     

            ### Reset  metrics ###
            train_mlm_loss_metric.reset_states()
            
        if step  == total_steps:
            break


@tf.function
def distributed_mlm_train_step(data):
    strategy.experimental_run_v2(mlm_train_step, args=(data,))


@tf.function
def mlm_train_step(inputs):
    features, labels = inputs

    with tf.GradientTape() as tape:
        predictions = mlm_model(features, training=True)[0]
        loss = compute_mlm_loss(labels, predictions)

    gradients = tape.gradient(loss, mlm_model.trainable_variables)
    optimizer.apply_gradients(zip(gradients, mlm_model.trainable_variables))

    train_mlm_loss_metric.update_state(loss)
    

compute_mlm_loss, train_mlm_loss_metric = define_mlm_loss_and_metrics()

Now train it as train_mlm(train_dist_dataset, TOTAL_STEPS, EVALUATE_EVERY)现在将其训练为train_mlm(train_dist_dataset, TOTAL_STEPS, EVALUATE_EVERY)

Above ode is from this notebook and you need to do all the necessary things given exactly 上面的颂歌来自这个笔记本，你需要做所有必要的事情。

The author says in the end that:作者最后说：

This fine tuned model can be loaded just as the original to build a classification model from it这个经过微调的 model 可以像原来一样加载，从中构建一个分类 model

如何在 MLM 任务上训练 Tensorflow 的预训练 BERT？（仅在 Tensorflow 中使用预训练的 model）

问题描述

2 个解决方案

解决方案1
0 2022-01-26 05:41:21

解决方案2
0 2022-01-31 17:02:51

如何在 MLM 任务上训练 Tensorflow 的预训练 BERT？ （仅在 Tensorflow 中使用预训练的 model）

问题描述

2 个解决方案

解决方案1 0 2022-01-26 05:41:21

解决方案2 0 2022-01-31 17:02:51

如何在 MLM 任务上训练 Tensorflow 的预训练 BERT？（仅在 Tensorflow 中使用预训练的 model）

解决方案1
0 2022-01-26 05:41:21

解决方案2
0 2022-01-31 17:02:51