简体   繁体   中英

How to make faster deep reinforcement learning training

As you know, Deep Reinforcement Learning (DRL) training could take more than 10 days using single CPU. Using parallel execution tools (such as CUDA), the training time decreases up to 1 day (depending on the CPU and GPU features). But when using CUDA, GPU usage is around 10% and the training time is still too long. This is so disturbing for developers who want to check the result frequently while developing their code. What do you recommend to decrease the training time as much as possible, in terms of coding tips, building the model, setting, GPU hardware etc.

From thedocs :

By default, TensorFlow maps nearly all of the GPU memory of all GPUs (subject to CUDA_VISIBLE_DEVICES) visible to the process

So you shouldn't have to change any setting to allow more GPU usage. The quickest thing to check therefore could be whether the batch size is large enough - you might simply not be using the available memory to its fullest extent. Try increasing batch size to a point where you get an OOM error, and then scale it back a bit so it works.

If you have access to multiple GPUs you can make use of distributed strategies in tensorflow to make sure all GPUs are being used:

mirrored_strategy = tf.distribute.MirroredStrategy()
with mirrored_strategy.scope():
       <your model training code>

See the docs here

Mirrored strategy is used for synchronous distributed training across multiple GPUs on a single server. There's also a more intuitive explanation in this blog .

Finally, for more efficient processing you can alter the datatype of the inter-model parameters by using mixed precision .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM