简体繁体中英

How resilient is reporting to Trains server?

原文 2020-08-04 15:20:31 3 1 trains/ clearml

How would Trains go about sending any missing data to the server in the following scenarios?

Internet connection breaks temporarily while running an experiment
Internet connection breaks and doesn't come back before the experiment ends (any manual way to send all the data that was missed?)
The machine running Trains server resets in the middle of an experiment

1 answers

Disclaimer: I'm part of the allegro.ai Trains team

Trains will auto retry to send logs, basically forever. The logs/metrics are sent in a background thread so it should not interfere with execution. You can set the backoff parameter, to control the retry frequency, by adjusting the sdk.network.iteration.retry_backoff_factor_sec parameter in your ~/trains.conf file, see example here
The experiment will try to flush all metrics to the backend when the experiment ends, ie the process will wait at_exit until all metrics are sent. This means if the connection was dropped, it will retry until it is up again. If the experiment was aborted manually, there is no way to capture/resend those lost metric reports. That said with the new 0.16 version, offline mode was introduced. This way one can run the entire experiment offline, then later report all logs/metrics/artifacts.
The Trains-Server machine is fully stateless (the states themselves are stored in the databases on the machine) this means that from the experiment perspective, the connection was dropped for a few minutes and then it's available again. To your question, if the Trains-Server restarted, it is transparent to all experiments and they continue as usual, no reports will be lost.

How should Trains be used with hyper-param optimization tools like RayTune?

pip install trains fails

How to Backup/Restore TRAINS-server when moving from AMI to local machine

Can ClearML (formerly Trains) work a local server?

trains with grid search

Trains: reusing previous task id

How to manually register a sci-kit model with TRAINS python auto-magical experiment manager?

Parallel Coordinates Plot in TRAINS

Will Trains automagically log Tensorboard HParams?

Tracking separate train/test processes with Trains

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question How should Trains be used with hyper-param optimization tools like RayTune? pip install trains fails How to Backup/Restore TRAINS-server when moving from AMI to local machine Can ClearML (formerly Trains) work a local server? trains with grid search Trains: reusing previous task id How to manually register a sci-kit model with TRAINS python auto-magical experiment manager? Parallel Coordinates Plot in TRAINS Will Trains automagically log Tensorboard HParams? Tracking separate train/test processes with Trains

Related Tags

How resilient is reporting to Trains server?

Question

1 answers

solution1 1 ACCPTED 2020-08-10 11:25:46

solution1
1 ACCPTED 2020-08-10 11:25:46