TensorFlow 從多個 GPU 中選擇要使用的 GPU

Question

我是 TensorFlow 的新手，並已按照 TensorFlow 網站上的說明安裝了 CUDA-7.5 和 cudnn-v4。 調整TensorFlow配置文件並嘗試從網站運行以下示例后：

python -m tensorflow.models.image.mnist.convolutional

我很確定 TensorFlow 正在使用其中一個 GPU 而不是另一個，但是，我希望它使用更快的一個。 我想知道這個示例代碼是否只是默認使用它找到的第一個 GPU。 如果是這樣，我如何選擇在 python 的 TensorFlow 代碼中使用哪個 GPU？

運行示例代碼時我得到的消息是：

ldt-tesla:~$ python -m tensorflow.models.image.mnist.convolutional
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally
Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties:
name: Tesla K20c
major: 3 minor: 5 memoryClockRate (GHz) 0.7055
pciBusID 0000:03:00.0
Total memory: 4.63GiB
Free memory: 4.57GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:572] creating context when one is currently active; existing: 0x2f27390
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 1 with properties:
name: Quadro K2200
major: 5 minor: 0 memoryClockRate (GHz) 1.124
pciBusID 0000:02:00.0
Total memory: 3.95GiB
Free memory: 3.62GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:59] cannot enable peer access from device ordinal 0 to device ordinal 1
I tensorflow/core/common_runtime/gpu/gpu_init.cc:59] cannot enable peer access from device ordinal 1 to device ordinal 0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 1
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y N
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 1:   N Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:806] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K20c, pci bus id: 0000:03:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:793] Ignoring gpu device (device: 1, name: Quadro K2200, pci bus id: 0000:02:00.0) with Cuda multiprocessor count: 5. The minimum required count is 8. You can adjust this requirement with the env var TF_MIN_GPU_MULTIPROCESSOR_COUNT.
Initialized!

Answer 1

您可以設置CUDA_VISIBLE_DEVICES環境變量以僅公開您想要的環境變量，在屏蔽 gpu 時引用此示例：

CUDA_VISIBLE_DEVICES=1  Only device 1 will be seen
CUDA_VISIBLE_DEVICES=0,1    Devices 0 and 1 will be visible
CUDA_VISIBLE_DEVICES=”0,1”  Same as above, quotation marks are optional
CUDA_VISIBLE_DEVICES=0,2,3  Devices 0, 2, 3 will be visible; device 1 is masked

Answer 2

您可以設置您希望在運行時在哪個 GPU 上運行程序，而不是將其硬編碼到您的腳本中。 這將防止在沒有多個 GPU 或沒有太多 GPU 的設備上運行時出現問題。

假設你想在 GPU #3 上運行，你可以這樣做：

CUDA_VISIBLE_DEVICES=3, python -m tensorflow.models.image.mnist.convolutional

TensorFlow 從多個 GPU 中選擇要使用的 GPU

問題描述

2 個解決方案

解決方案1
6 已采納 2016-08-17 17:14:12

解決方案2
1 2020-05-27 17:41:09

TensorFlow 從多個 GPU 中選擇要使用的 GPU

問題描述

2 個解決方案

解決方案1 6 已采納 2016-08-17 17:14:12

解決方案2 1 2020-05-27 17:41:09

解決方案1
6 已采納 2016-08-17 17:14:12

解決方案2
1 2020-05-27 17:41:09