[英]How to train Tensorflow's pre trained BERT on MLM task? ( Use pre-trained model only in Tensorflow)
Before Anyone suggests pytorch
and other things, I am looking specifically for Tensorflow + pretrained + MLM task only.在有人建议
pytorch
和其他东西之前,我专门寻找Tensorflow +预训练+传销任务。 I know, there are lots of blogs for PyTorch
and lots of blogs for fine tuning ( Classification) on Tensorflow.我知道,有很多关于
PyTorch
的博客和很多关于 Tensorflow 的微调(分类)博客。
Coming to the problem, I got a language model which is English + LaTex where a text data can represent any text from Physics, Chemistry, MAths and Biology and any typical example can look something like this: Link to OCR image遇到问题,我得到了一种语言 model 即英语 + LaTex,其中文本数据可以表示物理、化学、数学和生物学中的任何文本,任何典型示例都可以看起来像这样:链接到 OCR 图像
"Find the value of function x in the equation: \n \\( f(x)=\\left\\{\\begin{array}{ll}x^{2} & \\text { if } x<0 \\\\ 2 x & \\text { if } x \\geq 0\\end{array}\\right. \\)"
So my language model needs to understand \geq \\begin array \eng \left \right
other than the English language and that is why I need to train an MLM
first on pre-trained BERT
or SciBERT
to have both.所以我的语言 model 需要理解
\geq \\begin array \eng \left \right
而不是英语,这就是为什么我需要先在预训练的BERT
或SciBERT
上训练MLM
以同时具备两者。 So I went up digging the internet and found some tutorials:所以我去挖掘互联网并找到了一些教程:
MLM
training on Tensorflow
BUT from Scratch; Tensorflow
BUT 从零开始进行MLM
培训; I need pre-trained MLM
on pre-trained but in Pytorch
; Pytorch
中的MLM
; I need Tensorflow
Tensorflow
MLM
MLM
I already have a fine tuning classification model.我已经有了一个微调分类 model。 Some of the code is as follows:
部分代码如下:
tokenizer = transformers.BertTokenizer.from_pretrained('bert-large-uncased')
def regular_encode(texts, tokenizer, maxlen=maxlen):
enc_di = tokenizer.batch_encode_plus(texts, return_token_type_ids=False,padding='max_length',max_length=maxlen,truncation = True,)
return np.array(enc_di['input_ids'])
Xtrain_encoded = regular_encode(X_train.astype('str'), tokenizer, maxlen=maxlen)
ytrain_encoded = tf.keras.utils.to_categorical(y_train, num_classes=classes,dtype = 'int32')
def build_model(transformer, loss='categorical_crossentropy', max_len=maxlen, dense = 512, drop1 = 0.3, drop2 = 0.3):
input_word_ids = tf.keras.layers.Input(shape=(max_len,), dtype=tf.int32, name="input_word_ids")
sequence_output = transformer(input_word_ids)[0]
cls_token = sequence_output[:, 0, :]
#Fine Tuning Model Start
x = tf.keras.layers.Dropout(drop1)(cls_token)
x = tf.keras.layers.Dense(512, activation='relu')(x)
x = tf.keras.layers.Dropout(drop2)(x)
out = tf.keras.layers.Dense(classes, activation='softmax')(x)
model = tf.keras.Model(inputs=input_word_ids, outputs=out)
return model
Only useful thing I could get was in the HuggingFace
that我能得到的唯一有用的东西是在
HuggingFace
中
With the tight interoperability between TensorFlow and PyTorch models, you can even save the model and then reload it as a PyTorch model (or vice-versa)
With the tight interoperability between TensorFlow and PyTorch models, you can even save the model and then reload it as a PyTorch model (or vice-versa)
from transformers import AutoModelForSequenceClassification
model.save_pretrained("my_imdb_model")
pytorch_model = AutoModelForSequenceClassification.from_pretrained("my_imdb_model", from_tf=True)
So maybe I could train a pytorch MLM
and then load it as a tensorflow
fine tuned classification model?所以也许我可以训练一个
pytorch MLM
,然后将其加载为tensorflow
微调分类 model? Is there any other way?还有其他方法吗?
I also encountered this problem, and did some research.我也遇到了这个问题,做了一些研究。 A possible solution is to use tensorflow_hub models supporting mlm.
一种可能的解决方案是使用支持 mlm 的 tensorflow_hub 模型。
For example, https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-10_H-128_A-2/2 this model has a relatively complete tutorial on how to create a mlm task.比如https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-10_H-128_A-2/2这个model有一个比较完整的教程,介绍如何创建mlm任务。 But there is a problem if you are designing multitasks.
但是,如果您正在设计多任务,就会出现问题。 Because:
因为:
import tensorflow_hub as hub
encoder = hub.load(model_path)
mlm = hub.KerasLayer(encoder.mlm, trainable=True)
I'm not very sure this can help in multitasks because I dont know whether encoder is updated.我不太确定这对多任务有帮助,因为我不知道编码器是否已更新。 Hope this could be useful and maybe someone could also help me out.
希望这可能有用,也许有人也可以帮助我。 Thanks.
谢谢。
Some Updates: I have checked the official document for tfhub, and I guess I have mistaken what the encoder means in this model.一些更新:我查看了tfhub 的官方文档,我想我弄错了这个model中编码器的含义。 The weights are updated in the encoder and this actually can serve multitask pretraining, will update if any result goes wrong in my experiment.
权重在编码器中更新,这实际上可以用于多任务预训练,如果在我的实验中出现任何结果错误,它将更新。
I think after some research, I found something.我想经过一番研究,我发现了一些东西。 I don't know if it'll work but it uses
transformer
and Tensorflow
with XLM
.我不知道它是否会工作,但它使用
transformer
和Tensorflow
和XLM
。 I think it'll work for BERT
too.我认为它也适用于
BERT
。
PRETRAINED_MODEL = 'jplu/tf-xlm-roberta-large'
from tensorflow.keras.optimizers import Adam
import transformers
from transformers import TFAutoModelWithLMHead, AutoTokenizer
def create_mlm_model_and_optimizer():
with strategy.scope():
model = TFAutoModelWithLMHead.from_pretrained(PRETRAINED_MODEL)
optimizer = tf.keras.optimizers.Adam(learning_rate=LR)
return model, optimizer
mlm_model, optimizer = create_mlm_model_and_optimizer()
def define_mlm_loss_and_metrics():
with strategy.scope():
mlm_loss_object = masked_sparse_categorical_crossentropy
def compute_mlm_loss(labels, predictions):
per_example_loss = mlm_loss_object(labels, predictions)
loss = tf.nn.compute_average_loss(
per_example_loss, global_batch_size = global_batch_size)
return loss
train_mlm_loss_metric = tf.keras.metrics.Mean()
return compute_mlm_loss, train_mlm_loss_metric
def masked_sparse_categorical_crossentropy(y_true, y_pred):
y_true_masked = tf.boolean_mask(y_true, tf.not_equal(y_true, -1))
y_pred_masked = tf.boolean_mask(y_pred, tf.not_equal(y_true, -1))
loss = tf.keras.losses.sparse_categorical_crossentropy(y_true_masked,
y_pred_masked,
from_logits=True)
return loss
def train_mlm(train_dist_dataset, total_steps=2000, evaluate_every=200):
step = 0
### Training lopp ###
for tensor in train_dist_dataset:
distributed_mlm_train_step(tensor)
step+=1
if (step % evaluate_every == 0):
### Print train metrics ###
train_metric = train_mlm_loss_metric.result().numpy()
print("Step %d, train loss: %.2f" % (step, train_metric))
### Reset metrics ###
train_mlm_loss_metric.reset_states()
if step == total_steps:
break
@tf.function
def distributed_mlm_train_step(data):
strategy.experimental_run_v2(mlm_train_step, args=(data,))
@tf.function
def mlm_train_step(inputs):
features, labels = inputs
with tf.GradientTape() as tape:
predictions = mlm_model(features, training=True)[0]
loss = compute_mlm_loss(labels, predictions)
gradients = tape.gradient(loss, mlm_model.trainable_variables)
optimizer.apply_gradients(zip(gradients, mlm_model.trainable_variables))
train_mlm_loss_metric.update_state(loss)
compute_mlm_loss, train_mlm_loss_metric = define_mlm_loss_and_metrics()
Now train it as train_mlm(train_dist_dataset, TOTAL_STEPS, EVALUATE_EVERY)
现在将其训练为
train_mlm(train_dist_dataset, TOTAL_STEPS, EVALUATE_EVERY)
Above ode is from this notebook and you need to do all the necessary things given exactly 上面的颂歌来自这个笔记本,你需要做所有必要的事情。
The author says in the end that:作者最后说:
This fine tuned model can be loaded just as the original to build a classification model from it
这个经过微调的 model 可以像原来一样加载,从中构建一个分类 model
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.