简体   繁体   中英

ClearML multiple tasks in single script changes logged value names

I trained multiple models with different configuration for a custom hyperparameter search. I use pytorch_lightning and its logging (TensorboardLogger). When running my training script after Task.init() ClearML auto-creates a Task and connects the logger output to the server.

I log for each straining stage train , val and test the following scalars at each epoch: loss , acc and iou

When I have multiple configuration, eg networkA and networkB the first training log its values to loss , acc and iou , but the second to networkB:loss , networkB:acc and networkB:iou . This makes values umcomparable.

My training loop with Task initalization looks like this:

names = ['networkA', networkB']
for name in names:
     task = Task.init(project_name="NetworkProject", task_name=name)
     pl_train(name)
     task.close()

method pl_train is a wrapper for whole training with Pytorch Ligtning. No ClearML code is inside this method.

Do you have any hint, how to properly use the usage of a loop in a script using completly separated tasks?


Edit: ClearML version was 0.17.4. Issue is fixed in main branch.

Disclaimer I'm part of the ClearML (formerly Trains) team.

pytorch_lightning is creating a new Tensorboard for each experiment. When ClearML logs the TB scalars, and it captures the same scalar being re-sent again, it adds a prefix so if you are reporting the same metric it will not overwrite the previous one. A good example would be reporting loss scalar in the training phase vs validation phase (producing "loss" and "validation:loss"). It might be the task.close() call does not clear the previous logs, so it "thinks" this is the same experiment, hence adding the prefix networkB to the loss . As long as you are closing the Task after training is completed you should have all experiments log with the same metric/variant (title/series). I suggest opening a GitHub issue, this should probably be considered a bug.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM