[英]How to run many TensorFlow instances in parallel on different GPUs on the same machine?
Let's pretend that I launch the following commands in parallel to train many TensorFlow models at once on the same machine: 假设我并行启动以下命令以在同一台机器上一次训练许多TensorFlow模型:
python3 launch_training.py --gpu 0
python3 launch_training.py --gpu 1
python3 launch_training.py --gpu 2
python3 launch_training.py --gpu 3
python3 launch_training.py --gpu 4
python3 launch_training.py --gpu 5
python3 launch_training.py --gpu 6
python3 launch_training.py --gpu 7
Let's pretend that inside launch_training.py
, a TensorFlow graph and session are created, and with the following context: with tf.device('/gpu:0'):
, and where the 0
is replaced by the proper --gpu
index argument). 我们假设在
launch_training.py
内部launch_training.py
了一个TensorFlow图和会话,并具有以下上下文: with tf.device('/gpu:0'):
--gpu
其中0
替换为正确的--gpu
索引参数)。
Will this work? 这样行吗? If not, which steps would I have to take to make this work?
如果没有,我必须采取哪些步骤来使这项工作进行? I'd like to know this before renting GPUs.
在租用GPU之前,我想知道这一点。
You have to specify a gpu device with with tf.device('gpu:N')
where N
is the device index. 您必须
with tf.device('gpu:N')
指定一个gpu设备,其中N
是设备索引。 Read https://www.tensorflow.org/programmers_guide/using_gpu and https://github.com/carla-simulator/carla/issues/116 first 首先阅读https://www.tensorflow.org/programmers_guide/using_gpu和https://github.com/carla-simulator/carla/issues/116
I think you've confused running the same script multiple times on different GPUs and running one script using multiple GPUs. 我认为您对在不同的GPU上多次运行同一脚本和使用多个GPU来运行一个脚本感到困惑。 In the former case, read the "Using a single GPU on a multi-GPU system" section of the TensorFlow guide, for the latter "Using multiple GPUs".
在前一种情况下,请阅读TensorFlow指南的“在多GPU系统上使用单个GPU”部分,而在后一种情况下,请阅读“使用多个GPU”。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.