繁体   English   中英

如何获得 huggingface.transformers Trainer 每个时期或步骤的准确性?

[英]How to get the accuracy per epoch or step for the huggingface.transformers Trainer?

我将 huggingface Trainer 与 BertForSequenceClassification.from_pretrained("bert-base-uncased") model 一起使用。

简化后,它看起来像这样:

model = BertForSequenceClassification.from_pretrained("bert-base-uncased")
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")

training_args = TrainingArguments(
        output_dir="bert_results",
        num_train_epochs=3,
        per_device_train_batch_size=8,
        per_device_eval_batch_size=32,
        warmup_steps=500,
        weight_decay=0.01,
        logging_dir="bert_results/logs",
        logging_steps=10
        )

trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=train_dataset,
        eval_dataset=val_dataset,
        compute_metrics=compute_metrics
        )

日志包含每 10 步的损失,但我似乎无法找到训练的准确性。 有谁知道如何获得准确性,例如通过更改记录器的详细程度? 我似乎无法在网上找到任何相关信息。

谢谢,CptBaas

您可以使用evaluation_strategy训练参数确定 Trainer 的评估间隔。 它目前接受 3 个值:

“否”:训练期间不进行评估。

“步骤”:每个 eval_steps 都会完成(并记录)评估。

“epoch”:在每个 epoch 结束时进行评估。

您可以加载准确度指标并使其与您的compute_metrics function 一起使用。 例如,它会像:

from datasets import load_metric
metric = load_metric('accuracy')

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    predictions = np.argmax(predictions, axis=1)
    return metric.compute(predictions=predictions, references=labels)

这个compute_metrics function 示例基于Hugging Face 的文本分类教程 它在我的测试中有效。

我遇到了同样的问题,我通过添加一个自定义回调解决了这个问题,该回调在每个回调结束时使用 train_dataset 调用 evaluate() 方法。

class CustomCallback(TrainerCallback):
    
    def __init__(self, trainer) -> None:
        super().__init__()
        self._trainer = trainer
    
    def on_epoch_end(self, args, state, control, **kwargs):
        if control.should_evaluate:
            control_copy = deepcopy(control)
            self._trainer.evaluate(eval_dataset=self._trainer.train_dataset, metric_key_prefix="train")
            return control_copy

trainer = Trainer(
    model=model,                         # the instantiated Transformers model to be trained
    args=training_args,                  # training arguments, defined above
    train_dataset=train_dataset,         # training dataset
    eval_dataset=valid_dataset,          # evaluation dataset
    compute_metrics=compute_metrics,     # the callback that computes metrics of interest
    tokenizer=tokenizer
)
trainer.add_callback(CustomCallback(trainer)) 
train = trainer.train()

这给出了如下的训练指标:

{'train_loss': 0.7159061431884766, 'train_accuracy': 0.4, 'train_f1': 0.5714285714285715, 'train_runtime': 6.2973, 'train_samples_per_second': 2.382, 'train_steps_per_second': 0.159, 'epoch': 1.0}
{'eval_loss': 0.8529007434844971, 'eval_accuracy': 0.0, 'eval_f1': 0.0, 'eval_runtime': 2.0739, 'eval_samples_per_second': 0.964, 'eval_steps_per_second': 0.482, 'epoch': 1.0}



获得训练精度的另一种方法是扩展基本 Trainer class 并覆盖 compute_loss() 方法,如下所示:

class CustomTrainer(Trainer):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        
    def compute_loss(self, model, inputs, return_outputs=False):
        """
        How the loss is computed by Trainer. By default, all models return the loss in the first element.
        Subclass and override for custom behavior.
        """
        if self.label_smoother is not None and "labels" in inputs:
            labels = inputs.pop("labels")
        else:
            labels = None
        outputs = model(**inputs)

        # code for calculating accuracy
        if "labels" in inputs:
            preds = outputs.logits.detach()
            acc1 = accuracy_score(inputs.labels.reshape(1, len(inputs.labels))[0], preds.argmax(axis=1))
            self.log({'accuracy_score': acc1})
            acc = (
                (preds.argmax(axis=-1) == inputs.labels.reshape(1, len(inputs.labels))[0])
                .type(torch.float)
                .mean()
                .item()
            )
            self.log({"train_accuracy": acc})
        # end code for calculating accuracy
                    
        # Save past state if it exists
        # TODO: this needs to be fixed and made cleaner later.
        if self.args.past_index >= 0:
            self._past = outputs[self.args.past_index]

        if labels is not None:
            loss = self.label_smoother(outputs, labels)
        else:
            # We don't use .loss here since the model may return tuples instead of ModelOutput.
            loss = outputs["loss"] if isinstance(outputs, dict) else outputs[0]

        return (loss, outputs) if return_outputs else loss

然后像这样使用 CustomTrainer 而不是培训师:

trainer = CustomTrainer(
    model=model,                         # the instantiated Transformers model to be trained
    args=training_args,                  # training arguments, defined above
    train_dataset=train_dataset,         # training dataset
    eval_dataset=valid_dataset,          # evaluation dataset
    compute_metrics=compute_metrics,     # the callback that computes metrics of interest
    tokenizer=tokenizer
)

需要 Function 返回所需的指标。 这是我写的,它返回指标列表(越多越好,对吧?):

def compute_metrics(eval_pred):
    metrics = ["accuracy", "recall", "precision", "f1"] #List of metrics to return
    metric={}
    for met in metrics:
       metric[met] = load_metric(met)
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    metric_res={}
    for met in metrics:
       metric_res[met]=metric[met].compute(predictions=predictions, references=labels)[met]
    return metric_res

此外,如果需要按 epoch 计算指标,则需要在训练参数中定义它:

training_args = TrainingArguments(
    ...,
    evaluation_strategy = "epoch", #To calculate metrics per epoch
    logging_strategy="epoch", #Extra: to log training data stats for loss 
)

最后一步是将其添加到训练器中:

trainer = Trainer(
    ...,
    compute_metrics=compute_metrics,
)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM