单个脚本中的 ClearML 多个任务更改记录的值名称

Question

I trained multiple models with different configuration for a custom hyperparameter search.我为自定义超参数搜索训练了具有不同配置的多个模型。 I use pytorch_lightning and its logging (TensorboardLogger).我使用 pytorch_lightning 及其日志记录（TensorboardLogger）。 When running my training script after Task.init() ClearML auto-creates a Task and connects the logger output to the server.在 Task.init() 之后运行我的训练脚本时，ClearML 会自动创建一个任务并将记录器 output 连接到服务器。

I log for each straining stage train , val and test the following scalars at each epoch: loss , acc and iou我记录每个应变阶段train ， val并在每个时期test以下标量： loss ， acc和iou

When I have multiple configuration, eg networkA and networkB the first training log its values to loss , acc and iou , but the second to networkB:loss , networkB:acc and networkB:iou .当我有多个配置时，例如networkA和networkB第一个训练将其值记录到loss 、 acc和iou ，但第二个记录到networkB:loss 、 networkB:acc和networkB:iou 。 This makes values umcomparable.这使得价值观无法比较。

My training loop with Task initalization looks like this:我的任务初始化训练循环如下所示：

names = ['networkA', networkB']
for name in names:
     task = Task.init(project_name="NetworkProject", task_name=name)
     pl_train(name)
     task.close()

method pl_train is a wrapper for whole training with Pytorch Ligtning.方法 pl_train 是使用 Pytorch Ligtning 进行整个训练的包装器。 No ClearML code is inside this method.此方法中没有 ClearML 代码。

Do you have any hint, how to properly use the usage of a loop in a script using completly separated tasks?您是否有任何提示，如何使用完全分离的任务在脚本中正确使用循环？

Edit: ClearML version was 0.17.4.编辑：ClearML 版本是 0.17.4。 Issue is fixed in main branch.问题已在主分支中修复。

Answer 1

Disclaimer I'm part of the ClearML (formerly Trains) team.免责声明我是 ClearML（前身为 Trains）团队的一员。

pytorch_lightning is creating a new Tensorboard for each experiment. pytorch_lightning正在为每个实验创建一个新的 Tensorboard。 When ClearML logs the TB scalars, and it captures the same scalar being re-sent again, it adds a prefix so if you are reporting the same metric it will not overwrite the previous one.当 ClearML 记录 TB 标量并捕获再次重新发送的相同标量时，它会添加一个前缀，因此如果您报告相同的指标，它不会覆盖前一个指标。 A good example would be reporting loss scalar in the training phase vs validation phase (producing "loss" and "validation:loss").一个很好的例子是在训练阶段和验证阶段报告loss标量（产生“损失”和“验证：损失”）。 It might be the task.close() call does not clear the previous logs, so it "thinks" this is the same experiment, hence adding the prefix networkB to the loss .可能是task.close()调用没有清除以前的日志，所以它“认为”这是同一个实验，因此将前缀networkB添加到loss中。 As long as you are closing the Task after training is completed you should have all experiments log with the same metric/variant (title/series).只要您在训练完成后关闭任务，您就应该使用相同的指标/变量（标题/系列）记录所有实验。 I suggest opening a GitHub issue, this should probably be considered a bug.我建议打开一个 GitHub 问题，这可能应该被认为是一个错误。

单个脚本中的 ClearML 多个任务更改记录的值名称

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-02-19 22:31:43

单个脚本中的 ClearML 多个任务更改记录的值名称

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-02-19 22:31:43

解决方案1
1 已采纳 2021-02-19 22:31:43